Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
dc.contributor.author
Kurth, Andreas
dc.contributor.author
Vogel, Pirmin
dc.contributor.author
Marongiu, Andrea
dc.contributor.author
Benini, Luca
dc.date.accessioned
2019-12-09T17:17:53Z
dc.date.available
2019-02-25T07:40:53Z
dc.date.available
2019-12-09T17:17:53Z
dc.date.issued
2018
dc.identifier.isbn
978-1-5386-8477-1
en_US
dc.identifier.isbn
978-1-5386-8478-8
en_US
dc.identifier.other
10.1109/ICCD.2018.00052
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/327349
dc.identifier.doi
10.3929/ethz-b-000292549
dc.description.abstract
Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full,
hampering the scalability of parallel accelerators.
In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst
DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our
solution is based on three novel concepts: To minimize the rate of TLB misses, the TLB is proactively filled by compiler-generated Prefetching Helper Threads, which use run-time information to issue timely prefetches. To reduce the latency of TLB misses, misses are handled by a variable number of parallel Miss Handling Helper Threads. To support parallel burst DMA transfers to SVM without additional buffers, we add lightweight hardware to a standard DMA engine to detect and react to TLB misses. Compared to the state of the art, our work improves accelerator performance for memory-intensive kernels by up to 4× and by up to 60 % for irregular and regular memory access patterns, respectively.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
IEEE
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
en_US
dc.type
Conference Paper
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2019-01-17
ethz.book.title
2018 IEEE 36th International Conference on Computer Design (ICCD)
en_US
ethz.pages.start
292
en_US
ethz.pages.end
300
en_US
ethz.version.deposit
acceptedVersion
en_US
ethz.event
36th IEEE International Conference on Computer Design (ICCD 2018)
en_US
ethz.event.location
Orlando, FL, USA
en_US
ethz.event.date
October 7-10,2018
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
Piscataway, NJ
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02636 - Institut für Integrierte Systeme / Integrated Systems Laboratory::03996 - Benini, Luca / Benini, Luca
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02636 - Institut für Integrierte Systeme / Integrated Systems Laboratory::03996 - Benini, Luca / Benini, Luca
en_US
ethz.date.deposited
2018-09-28T14:13:58Z
ethz.source
FORM
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2019-12-09T17:18:07Z
ethz.rosetta.lastUpdated
2022-03-29T00:27:31Z
ethz.rosetta.versionExported
true
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/292549
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/326817
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Scalable%20and%20Efficient%20Virtual%20Memory%20Sharing%20in%20Heterogeneous%20SoCs%20with%20TLB%20Prefetching%20and%20MMU-Aware%20DMA%20Engine&rft.date=2018&rft.spage=292&rft.epage=300&rft.au=Kurth,%20Andreas&Vogel,%20Pirmin&Marongiu,%20Andrea&Benini,%20Luca&rft.isbn=978-1-5386-8477-1&978-1-5386-8478-8&rft.genre=proceeding&rft_id=info:doi/10.1109/ICCD.2018.00052&rft.btitle=2018%20IEEE%2036th%20International%20Conference%20on%20Computer%20Design%20(ICCD)
Files in this item
Publication type
-
Conference Paper [35666]