TY - GEN
T1 - Loop striping: Maximize parallelism for nested loops
AU - Xue, Chun
AU - Shao, Zili
AU - Liu, Meilin
AU - Qiu, Meikang
AU - Sha, Edwin H.M.
PY - 2006/1/1
Y1 - 2006/1/1
N2 - The majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where a stripe is a group of iterations in which all iterations are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50% and 54% respectively.
AB - The majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where a stripe is a group of iterations in which all iterations are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50% and 54% respectively.
UR - http://www.scopus.com/inward/record.url?scp=33746688045&partnerID=8YFLogxK
M3 - Conference article published in proceeding or book
SN - 3540366792
SN - 9783540366799
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 405
EP - 414
BT - Embedded and Ubiquitous Computing - International Conference, EUC 2006, Proceedings
PB - Springer Verlag
T2 - International Conference on Embedded and Ubiquitous Computing, EUC 2006
Y2 - 1 August 2006 through 4 August 2006
ER -