Show simple item record

dc.contributor.author
Wang, Zeyu
dc.contributor.author
He, Xiaoxi
dc.contributor.author
Zhou, Zimu
dc.contributor.author
Wang, Xu
dc.contributor.author
Ma, Qiang
dc.contributor.author
Miao, Xin
dc.contributor.author
Liu, Zhuo
dc.contributor.author
Thiele, Lothar
dc.contributor.author
Yang, Zheng
dc.date.accessioned
2022-12-13T09:09:17Z
dc.date.available
2022-12-11T17:59:14Z
dc.date.available
2022-12-13T09:09:17Z
dc.date.issued
2022
dc.identifier.isbn
978-1-6654-8643-9
en_US
dc.identifier.isbn
978-1-6654-8644-6
en_US
dc.identifier.other
10.1109/SECON55815.2022.9918563
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/586146
dc.description.abstract
Intelligent personal and home applications demand multiple deep neural networks (DNNs) running on resource-constrained platforms for compound inference tasks, known as multitask inference. To fit multiple DNNs into low-resource devices, emerging techniques resort to weight sharing among DNNs to reduce their storage. However, such reduction in storage fails to translate into efficient execution on common accelerators such as GPUs. Most DNN graph rewriters are blind for multi-DNN optimization, while GPU vendors provide inefficient APIs for parallel multi-DNN execution at runtime. A few prior graph rewriters suggest cross-model graph fusion for low-latency multi-DNN execution. Yet they request duplication of the shared weights, erasing the memory saving of weight-shared DNNs. In this paper, we propose MTS, a novel graph rewriter for efficient multitask inference with weight-shared DNNs. MTS adopts a model stitching algorithm which outputs a single computational graph for weight-shared DNNs without duplicating any shared weight. MTS also utilizes a model grouping strategy to avoid overwhelming the GPU when co-running tens of DNNs. Extensive experiments show that MTS accelerates multitask inference by up to 6.0x compared to sequentially executing multiple weight-shared DNNs. MTS also yields up to 2.5x lower latency and 3.7x less memory usage compared with NETFUSE, a state-of-the-art multi-DNN graph rewriter.
en_US
dc.language.iso
en
en_US
dc.publisher
IEEE
en_US
dc.subject
Deep Neural Networks
en_US
dc.subject
Multitask Inference
en_US
dc.subject
Model Acceleration
en_US
dc.title
Stitching Weight-Shared Deep Neural Networks for Efficient Multitask Inference on GPU
en_US
dc.type
Conference Paper
dc.date.published
2022-10-25
ethz.book.title
2022 19th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON)
en_US
ethz.pages.start
145
en_US
ethz.pages.end
153
en_US
ethz.event
19th IEEE International Conference on Sensing, Communication, and Networking (SECON 2022)
en_US
ethz.event.date
September 20-23, 2022
en_US
ethz.identifier.wos
ethz.publication.place
Piscataway, NJ
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
ethz.date.deposited
2022-12-11T17:59:26Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2022-12-13T09:09:18Z
ethz.rosetta.lastUpdated
2023-02-07T08:43:46Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Stitching%20Weight-Shared%20Deep%20Neural%20Networks%20for%20Efficient%20Multitask%20Inference%20on%20GPU&rft.date=2022&rft.spage=145&rft.epage=153&rft.au=Wang,%20Zeyu&He,%20Xiaoxi&Zhou,%20Zimu&Wang,%20Xu&Ma,%20Qiang&rft.isbn=978-1-6654-8643-9&978-1-6654-8644-6&rft.genre=proceeding&rft_id=info:doi/10.1109/SECON55815.2022.9918563&rft.btitle=2022%2019th%20Annual%20IEEE%20International%20Conference%20on%20Sensing,%20Communication,%20and%20Networking%20(SECON)
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record