Enabling Deep Learning on Edge Devices
dc.contributor.author
Qu, Zhongnan
dc.contributor.supervisor
Thiele, Lothar
dc.contributor.supervisor
Saukh, Olga
dc.date.accessioned
2022-10-06T11:10:36Z
dc.date.available
2022-10-05T09:14:35Z
dc.date.available
2022-10-05T11:57:37Z
dc.date.available
2022-10-05T20:01:18Z
dc.date.available
2022-10-06T06:16:10Z
dc.date.available
2022-10-06T06:17:14Z
dc.date.available
2022-10-06T07:20:19Z
dc.date.available
2022-10-06T09:20:47Z
dc.date.available
2022-10-06T11:10:36Z
dc.date.issued
2022
dc.identifier.uri
http://hdl.handle.net/20.500.11850/574442
dc.identifier.doi
10.3929/ethz-b-000574442
dc.description.abstract
Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For example, training a DNN requires high dynamic memory, a large-scale dataset, and a large number of computations (a long training time); even inference with a DNN also demands a large amount of static storage, computations (a long inference time), and energy. Therefore, state-of-the-art DNNs are often deployed on a cloud server with a large number of super-computers, a high-bandwidth communication bus, a shared storage infrastructure, and a high power supplement.
Recently, some new emerging intelligent applications, e.g., AR/VR, mobile assistants, Internet of Things, require us to deploy DNNs on resource-constrained edge devices. Compare to a cloud server, edge devices often have a rather small amount of resources. To deploy DNNs on edge devices, we need to reduce the size of DNNs, i.e., we target a better trade-off between the resource consumption and the model accuracy.
In this thesis, we study four edge intelligent scenarios and develop different methodologies to enable deep learning in each scenario. Since current DNNs are often over-parameterized, our goal is to find and reduce the redundancy of the DNNs in each scenario. We summarize the four studied scenarios as follows,
- Inference on Edge Devices: Firstly, we enable efficient inference of DNNs given the fixed resource constraints on edge devices. Compared to cloud inference, inference on edge devices avoids transmitting the data to the cloud server, which can achieve a more stable, fast, and energy-efficient inference. Regarding the main resource constraints from storing a large number of weights and computation during inference, we proposed an Adaptive Loss-aware Quantization (ALQ) for multi-bit networks. ALQ reduces the redundancy on the quantization bitwidth. The direct optimization objective (i.e., the loss) and the learned adaptive bitwidth assignment allow ALQ to acquire extremely low-bit networks with an average bitwidth below 1-bit while yielding a higher accuracy than state-of-the-art binary networks.
- Adaptation on Edge Devices: Secondly, we enable efficient adaptation of DNNs when the resource constraints on the target edge devices dynamically change during runtime, e.g., the allowed execution time and the allocatable RAM. To maximize the model accuracy during on-device inference, we develop a new synthesis approach, Dynamic REal-time Sparse Subnets (DRESS) that can sample and execute sub-networks with different resource demands from a backbone network. DRESS reduces the redundancy among multiple sub-networks by weight sharing and architecture sharing, resulting in storage efficiency and re-configuration efficiency, respectively. The generated sub-networks have different sparsity, and thus can be fetched to infer under varying resource constraints by utilizing sparse tensor computations.
- Learning on Edge Devices: Thirdly, we enable efficient learning of DNNs when facing unseen environments or users on edge devices. On-device learning requires both data- and memory-efficiency. We thus propose a new meta learning method p-Meta to enable memory-efficient learning with only a few samples of unseen tasks. pMeta reduces the updating redundancy by identifying and updating structurewise adaptation-critical weights only, which saves the necessary memory consumption for the updated weights.
- Edge-Server System: Finally, we enable efficient inference and efficient updating on edge-server systems. In an edge-server system, several resource-constrained edge devices are connected to a resource-sufficient server with a constrained communication bus. Due to the limited relevant training data beforehand, pretrained DNNs may be significantly improved after the initial deployment. On such an edge-server system, on-device inference is preferred over cloud inference, since it can achieve a fast and stable inference with less energy consumption. Yet retraining on the cloud server is preferred over on-device retraining (or federated learning) due to the limited memory and computing power on edge devices. We proposed a novel pipeline Deep Partial Updating (DPU) to iteratively update the deployed inference model. Particularly, when newly collected data samples from edge devices or from other sources are available at the server, the server smartly selects only a subset of critical weights to update and send to each edge device. This weightwise partial updating reduces the redundant updating by reusing the pretrained weights, which achieves a similar accuracy as full updating yet with a significantly lower communication cost.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Deep learning
en_US
dc.subject
On-device AI
en_US
dc.subject
Efficient deep learning
en_US
dc.subject
Edge AI
en_US
dc.subject
Edge computing
en_US
dc.title
Enabling Deep Learning on Edge Devices
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2022-10-05
ethz.size
181 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.grant
NCCR Automation (phase I)
en_US
ethz.identifier.diss
28528
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
en_US
ethz.grant.agreementno
180545
ethz.grant.fundername
SNF
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.program
NCCR full proposal
ethz.date.deposited
2022-10-05T09:14:36Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2022-10-05T11:57:39Z
ethz.rosetta.lastUpdated
2024-02-02T18:24:05Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Enabling%20Deep%20Learning%20on%20Edge%20Devices&rft.date=2022&rft.au=Qu,%20Zhongnan&rft.genre=unknown&rft.btitle=Enabling%20Deep%20Learning%20on%20Edge%20Devices
Files in this item
Publication type
-
Doctoral Thesis [30298]