Enabling Deep Learning on Edge Devices

Qu, Zhongnan

doi:10.3929/ethz-b-000574442

Show simple item record

dc.contributor.author

Qu, Zhongnan

dc.contributor.supervisor

Thiele, Lothar

dc.contributor.supervisor

Saukh, Olga

dc.date.accessioned

2022-10-06T11:10:36Z

dc.date.available

2022-10-05T09:14:35Z

dc.date.available

2022-10-05T11:57:37Z

dc.date.available

2022-10-05T20:01:18Z

dc.date.available

2022-10-06T06:16:10Z

dc.date.available

2022-10-06T06:17:14Z

dc.date.available

2022-10-06T07:20:19Z

dc.date.available

2022-10-06T09:20:47Z

dc.date.available

2022-10-06T11:10:36Z

dc.date.issued

2022

dc.identifier.uri

http://hdl.handle.net/20.500.11850/574442

dc.identifier.doi

10.3929/ethz-b-000574442

dc.description.abstract

Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement. Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between the resource consumption and the model accuracy. In this thesis, we study four edge intelligent scenarios and develop different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario. We summarize the four studied scenarios as follows, - Inference on Edge Devices: Firstly, we enable efficient inference of DNNs given the fixed resource constraints on edge devices. Compared to cloud inference, inference on edge devices avoids transmitting the data to the cloud server, which can achieve a more stable, fast, and energy-efficient inference. Regarding the main resource constraints from storing a large number of weights and computation during inference, we proposed an Adaptive Loss-aware Quantization (ALQ) for multi-bit networks. ALQ reduces the redundancy on the quantization bitwidth. The direct optimization objective (i.e., the loss) and the learned adaptive bitwidth assignment allow ALQ to acquire extremely low-bit networks with an average bitwidth below 1-bit while yielding a higher accuracy than state-of-the-art binary networks. - Adaptation on Edge Devices: Secondly, we enable efficient adaptation of DNNs when the resource constraints on the target edge devices dynamically change during runtime, e.g., the allowed execution time and the allocatable RAM. To maximize the model accuracy during on-device inference, we develop a new synthesis approach, Dynamic REal-time Sparse Subnets (DRESS) that can sample and execute sub-networks with different resource demands from a backbone network. DRESS reduces the redundancy among multiple sub-networks by weight sharing and architecture sharing, resulting in storage efficiency and re-configuration efficiency, respectively. The generated sub-networks have different sparsity, and thus can be fetched to infer under varying resource constraints by utilizing sparse tensor computations. - Learning on Edge Devices: Thirdly, we enable efficient learning of DNNs when facing unseen environments or users on edge devices. On-device learning requires both data- and memory-efficiency. We thus propose a new meta learning method p-Meta to enable memory-efficient learning with only a few samples of unseen tasks. pMeta reduces the updating redundancy by identifying and updating structurewise adaptation-critical weights only, which saves the necessary memory consumption for the updated weights. - Edge-Server System: Finally, we enable efficient inference and efficient updating on edge-server systems. In an edge-server system, several resource-constrained edge devices are connected to a resource-sufficient server with a constrained communication bus. Due to the limited relevant training data beforehand, pretrained DNNs may be significantly improved after the initial deployment. On such an edge-server system, on-device inference is preferred over cloud inference, since it can achieve a fast and stable inference with less energy consumption. Yet retraining on the cloud server is preferred over on-device retraining (or federated learning) due to the limited memory and computing power on edge devices. We proposed a novel pipeline Deep Partial Updating (DPU) to iteratively update the deployed inference model. Particularly, when newly collected data samples from edge devices or from other sources are available at the server, the server smartly selects only a subset of critical weights to update and send to each edge device. This weightwise partial updating reduces the redundant updating by reusing the pretrained weights, which achieves a similar accuracy as full updating yet with a significantly lower communication cost.

en_US

dc.format

application/pdf

en_US

dc.language.iso

en

en_US

dc.publisher

ETH Zurich

en_US

dc.rights.uri

http://rightsstatements.org/page/InC-NC/1.0/

dc.subject

Deep learning

en_US

dc.subject

On-device AI

en_US

dc.subject

Efficient deep learning

en_US

dc.subject

Edge AI

en_US

dc.subject

Edge computing

en_US

dc.title

Enabling Deep Learning on Edge Devices

en_US

dc.type

Doctoral Thesis

dc.rights.license

In Copyright - Non-Commercial Use Permitted

dc.date.published

2022-10-05

ethz.size

181 p.

en_US

ethz.code.ddc

DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science

en_US

ethz.grant

NCCR Automation (phase I)

en_US

ethz.identifier.diss

28528

en_US

ethz.publication.place

Zurich

en_US

ethz.publication.status

published

en_US

ethz.leitzahl

ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)

en_US

ethz.grant.agreementno

180545

ethz.grant.fundername

SNF

ethz.grant.funderDoi

10.13039/501100001711

ethz.grant.program

NCCR full proposal

ethz.date.deposited

2022-10-05T09:14:36Z

ethz.source

FORM

ethz.eth

yes

en_US

ethz.availability

Open access

en_US

ethz.rosetta.installDate

2022-10-05T11:57:39Z

ethz.rosetta.lastUpdated

2024-02-02T18:24:05Z

ethz.rosetta.exportRequired

true

ethz.rosetta.versionExported

true

ethz.COinS

ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Enabling%20Deep%20Learning%20on%20Edge%20Devices&rft.date=2022&rft.au=Qu,%20Zhongnan&rft.genre=unknown&rft.btitle=Enabling%20Deep%20Learning%20on%20Edge%20Devices

Search print copy at ETH Library

Files in this item

Name:: PhD_Thesis_ZhongnanQu.pdf
Size:: 3.405Mb
Format:: Adobe PDF
Label:: Full text

Download

Publication type

Doctoral Thesis [30298]

Show simple item record

Research Collection

Search

Enabling Deep Learning on Edge Devices Mendeley CSV RIS BibTeX

Files in this item

Publication type

Enabling Deep Learning on Edge Devices

Mendeley

CSV

RIS

BibTeX