Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Kurth, Andreas; Vogel, Pirmin; Marongiu, Andrea; Benini, Luca

doi:10.1109/ICCD.2018.00052

Show simple item record

dc.contributor.author

Kurth, Andreas

dc.contributor.author

Vogel, Pirmin

dc.contributor.author

Marongiu, Andrea

dc.contributor.author

Benini, Luca

dc.date.accessioned

2019-12-09T17:17:53Z

dc.date.available

2019-02-25T07:40:53Z

dc.date.available

2019-12-09T17:17:53Z

dc.date.issued

2018

dc.identifier.isbn

978-1-5386-8477-1

en_US

dc.identifier.isbn

978-1-5386-8478-8

en_US

dc.identifier.other

10.1109/ICCD.2018.00052

en_US

dc.identifier.uri

http://hdl.handle.net/20.500.11850/327349

dc.identifier.doi

10.3929/ethz-b-000292549

dc.description.abstract

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our solution is based on three novel concepts: To minimize the rate of TLB misses, the TLB is proactively filled by compiler-generated Prefetching Helper Threads, which use run-time information to issue timely prefetches. To reduce the latency of TLB misses, misses are handled by a variable number of parallel Miss Handling Helper Threads. To support parallel burst DMA transfers to SVM without additional buffers, we add lightweight hardware to a standard DMA engine to detect and react to TLB misses. Compared to the state of the art, our work improves accelerator performance for memory-intensive kernels by up to 4× and by up to 60 % for irregular and regular memory access patterns, respectively.

en_US

dc.format

application/pdf

en_US

dc.language.iso

en

en_US

dc.publisher

IEEE

en_US

dc.rights.uri

http://rightsstatements.org/page/InC-NC/1.0/

dc.title

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

en_US

dc.type

Conference Paper

dc.rights.license

In Copyright - Non-Commercial Use Permitted

dc.date.published

2019-01-17

ethz.book.title

2018 IEEE 36th International Conference on Computer Design (ICCD)

en_US

ethz.pages.start

292

en_US

ethz.pages.end

300

en_US

ethz.version.deposit

acceptedVersion

en_US

ethz.event

36th IEEE International Conference on Computer Design (ICCD 2018)

en_US

ethz.event.location

Orlando, FL, USA

en_US

ethz.event.date

October 7-10,2018

en_US

ethz.identifier.wos

000458293200041

ethz.identifier.scopus

85062215151

ethz.publication.place

Piscataway, NJ

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02636 - Institut für Integrierte Systeme / Integrated Systems Laboratory::03996 - Benini, Luca / Benini, Luca

en_US

ethz.leitzahl.certified

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02636 - Institut für Integrierte Systeme / Integrated Systems Laboratory::03996 - Benini, Luca / Benini, Luca

en_US

ethz.date.deposited

2018-09-28T14:13:58Z

ethz.source

FORM

ethz.source

WOS

ethz.eth

yes

en_US

ethz.availability

Open access

en_US

ethz.rosetta.installDate

2019-12-09T17:18:07Z

ethz.rosetta.lastUpdated

2022-03-29T00:27:31Z

ethz.rosetta.versionExported

true

dc.identifier.olduri

http://hdl.handle.net/20.500.11850/292549

dc.identifier.olduri

http://hdl.handle.net/20.500.11850/326817

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Scalable%20and%20Efficient%20Virtual%20Memory%20Sharing%20in%20Heterogeneous%20SoCs%20with%20TLB%20Prefetching%20and%20MMU-Aware%20DMA%20Engine&rft.date=2018&rft.spage=292&rft.epage=300&rft.au=Kurth,%20Andreas&Vogel,%20Pirmin&Marongiu,%20Andrea&Benini,%20Luca&rft.isbn=978-1-5386-8477-1&978-1-5386-8478-8&rft.genre=proceeding&rft_id=info:doi/10.1109/ICCD.2018.00052&rft.btitle=2018%20IEEE%2036th%20International%20Conference%20on%20Computer%20Design%20(ICCD)

Search print copy at ETH Library

Files in this item

Name:: svm-prefetch-dma-paper.pdf
Size:: 2.442Mb
Format:: Adobe PDF
Label:: Full text (accepted version)

Download

Publication type

Conference Paper [35666]

Show simple item record

Research Collection

Search

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine Mendeley CSV RIS BibTeX

Files in this item

Publication type

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Mendeley

CSV

RIS

BibTeX