Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge

Chuang Hu, Wei Bao, Dan Wang, Fengming Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

270 Citations (Scopus)

Abstract

Recent advances in deep neural networks (DNNs) have substantially improved the accuracy and speed of a variety of intelligent applications. Nevertheless, one obstacle is that DNN inference imposes heavy computation burden to end devices, but offloading inference tasks to the cloud causes transmission of a large volume of data. Motivated by the fact that the data size of some intermediate DNN layers is significantly smaller than that of raw input data, we design the DNN surgery, which allows partitioned DNN processed at both the edge and cloud while limiting the data transmission. The challenge is twofold: (1) Network dynamics substantially influence the performance of DNN partition, and (2) State-of-the-art DNNs are characterized by a directed acyclic graph (DAG) rather than a chain so that partition is greatly complicated. In order to solve the issues, we design a Dynamic Adaptive DNN Surgery (DADS) scheme, which optimally partitions the DNN under different network condition. Under the lightly loaded condition, DNN Surgery Light (DSL) is developed, which minimizes the overall delay to process one frame. The minimization problem is equivalent to a min-cut problem so that a globally optimal solution is derived. In the heavily loaded condition, DNN Surgery Heavy (DSH) is developed, with the objective to maximize throughput. However, the problem is NP-hard so that DSH resorts an approximation method to achieve an approximation ratio of 3. Real-world prototype based on self-driving car video dataset is implemented, showing that compared with executing entire the DNN on the edge and cloud, DADS can improve latency up to 6.45 and 8.08 times respectively, and improve throughput up to 8.31 and 14.01 times respectively.

Original languageEnglish
Title of host publicationINFOCOM 2019 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1423-1431
Number of pages9
ISBN (Electronic)9781728105154
DOIs
Publication statusPublished - Apr 2019
Event2019 IEEE Conference on Computer Communications, INFOCOM 2019 - Paris, France
Duration: 29 Apr 20192 May 2019

Publication series

NameProceedings - IEEE INFOCOM
Volume2019-April
ISSN (Print)0743-166X

Conference

Conference2019 IEEE Conference on Computer Communications, INFOCOM 2019
Country/TerritoryFrance
CityParis
Period29/04/192/05/19

ASJC Scopus subject areas

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge'. Together they form a unique fingerprint.

Cite this