SIMD-Aware Loop Unrolling for Embedded Code Optimization

Yunyang Dai, Qing Li, Qi Zhang, C. C.Jay Kuo

Research output: Journal article publicationConference articleAcademic researchpeer-review


Due to the rising complexity of modern embedded media applications (EMAs), the instruction level parallelism (ILP) is not sufficient to meet the need. Compilers must have the capability to exploit the superword level parallelism (SLP), which can expose more concurrency lying in applications, minimize the latency created by memory access and hence produce more efficient codes. The loop is a good candidate for SLP extraction because of its paralleled structure between iterations. This work analyzes the memory access patterns found in EMAs and presents our method of loop unrolling to fully utilize these patterns to generate efficient Single Instruction Multiple Data (SIMD) instructions. Experimental results performed on TriMedia TM-1300 processor for the H.264 encoder show performance improvement by a factor ranging from 3 to 30 times with an average of 12 times.

Original languageEnglish
Pages (from-to)157-168
Number of pages12
JournalProceedings of SPIE - The International Society for Optical Engineering
Publication statusPublished - 1 Dec 2003
Externally publishedYes
EventMultimedia Systems and Applications VI - Orlando, FL, United States
Duration: 8 Sept 20039 Sept 2003


  • Embedded multimedia application
  • Loop unrolling
  • SIMD
  • Superword level parallelism

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'SIMD-Aware Loop Unrolling for Embedded Code Optimization'. Together they form a unique fingerprint.

Cite this