Communication-Computation Trade-off in Resource-Constrained Edge Inference

Jiawei Shao, Jun Zhang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computational cost of the on-device model and the communication overhead of forwarding the intermediate feature to the edge server. A general three-step framework is proposed for the effective inference: model split point selection to determine the on-device model, communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better tradeoff and significantly reduces the inference latency than baseline methods.

Original languageEnglish
Article number9311935
Pages (from-to)20-26
Number of pages7
JournalIEEE Communications Magazine
Volume58
Issue number12
DOIs
Publication statusPublished - Dec 2020

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Cite this