Loop scheduling with complete memory latency hiding on multi-core architecture

Chun Xue, Zili Shao, Meilin Liu, Meikang Qiu, Edwin H.M. Sha

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

8 Citations (Scopus)

Abstract

The widening gap between processor and memory performance is the main bottleneck for modern computer systems to achieve high processor utilization. In this paper, we propose a new loop scheduling with memory management technique, Iterational Retiming with Partitioning (IRP), that can completely hide memory latencies for applications with multi-dimensional loops on architectures like CELL processor [1]. In IRP, the iteration space is first partitioned carefully. Then a two-part schedule, consisting of processor and memory parts, is produced such that the execution time of the memory part never exceeds the execution time of the processor part. These two parts are executed simultaneously and complete memory latency hiding is reached. Experiments on DSP benchmarks show that IRP consistently produces optimal solutions as well as significant improvement over previous techniques.
Original languageEnglish
Title of host publicationProceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006
Pages375-382
Number of pages8
Volume1
DOIs
Publication statusPublished - 1 Dec 2006
Event12th International Conference on Parallel and Distributed Systems, ICPADS 2006 - Minneapolis, MN, United States
Duration: 12 Jul 200615 Jul 2006

Conference

Conference12th International Conference on Parallel and Distributed Systems, ICPADS 2006
Country/TerritoryUnited States
CityMinneapolis, MN
Period12/07/0615/07/06

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Loop scheduling with complete memory latency hiding on multi-core architecture'. Together they form a unique fingerprint.

Cite this