Abstract
This paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelism-inhibiting dependences on dependence cycles in two phases. First, we model dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem, i.e., one of finding retiming values for its nodes so that the minimum non-zero edge weight in the graph is maximized. We present our algorithm in three stages with each being built incrementally on the preceding one. Second, the optimal code for a loop is generated from the retimed graph of the loop found in the first phase. We demonstrate the effectiveness of our optimal algorithm by comparing with a number of representative non-optimal algorithms using a set of benchmarks frequently used in prior work.
Original language | English |
---|---|
Title of host publication | Embedded Systems Week 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09 |
Pages | 67-76 |
Number of pages | 10 |
DOIs | |
Publication status | Published - 21 Dec 2009 |
Event | Embedded Systems Week 2009, ESWEEK 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09 - Grenoble, France Duration: 11 Oct 2009 → 16 Oct 2009 |
Conference
Conference | Embedded Systems Week 2009, ESWEEK 2009 - 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES'09 |
---|---|
Country/Territory | France |
City | Grenoble |
Period | 11/10/09 → 16/10/09 |
Keywords
- Data dependence graph
- Iteration-level parallelism
- Loop parallelization
- Loop transformation
- Retiming
ASJC Scopus subject areas
- Hardware and Architecture
- Software
- Electrical and Electronic Engineering