TY - GEN
T1 - Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs
AU - Jiang, Jiantong
AU - Wang, Zeke
AU - Liu, Xue
AU - Gómez-Luna, Juan
AU - Guan, Nan
AU - Deng, Qingxu
AU - Zhang, Wei
AU - Mutlu, Onur
N1 - Funding Information:
Acknowledgement. We thank Intel FPGA Academic Program who denoted Terasic’s DE5a-Net FPGA board and licenses for our research. This work was supported by the National Key R&D Program of China (2017YFC0805005 and 2018YFB1702000), the Joint Funds of the National Natural Science Foundation of China (U1908212), and the National Natural Science Foundation of China (61871107 and 61602104). Onur Mutlu and Juan Gómez-Luna acknowledge support from the SAFARI Group’s industrial partners, especially Facebook, Google, Huawei, Intel, Microsoft, SRC, and VMware.
Funding Information:
We thank Intel FPGA Academic Program who denoted Terasic's DE5a-Net FPGA board and licenses for our research. This work was supported by the National Key R&D Program of China (2017YFC0805005 and 2018YFB1702000), the Joint Funds of the National Natural Science Foundation of China (U1908212), and the National Natural Science Foundation of China (61871107 and 61602104). Onur Mutlu and Juan G?mez-Luna acknowledge support from the SAFARI Group's industrial partners, especially Facebook, Google, Huawei, Intel, Microsoft, SRC, and VMware.
Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/2/23
Y1 - 2020/2/23
N2 - FPGA vendors provide OpenCL software development kits for easier programmability, with the goal of replacing the time-consuming and error-prone register-transfer level (RTL) programming. Many studies explore optimization methods (e.g., loop unrolling, local memory) to accelerate OpenCL programs running on FPGAs. These programs typically follow the default OpenCL execution model, where a kernel deploys multiple work-items arranged into workgroups. However, the default execution model is not always a good fit for an application mapped to the FPGA architecture, which is very different from the multithreaded architecture of GPUs, for which OpenCL was originally designed. In this work, we identify three other execution models that can better utilize the FPGA resources for the OpenCL applications that do not fit well into the default execution model. These three execution models are based on two OpenCL features devised for FPGA programming (namely, single work-item kernel and OpenCL channel).We observe that the selection of the right execution model determines the performance upper bound of a particular application, which can vary by two orders magnitude between the most suitable execution model and the most unsuitable one. However, there is no way to select the most suitable execution model other than empirically exploring the optimization space for the four of them, which can be prohibitive. To help FPGA programmers identify the right execution model, we propose Boyi, a systematic framework that makes automatic decisions by analyzing OpenCL programming patterns in an application. After finding the right execution model with the help of Boyi, programmers can apply other conventional optimizations to reach the performance upper bound. Our experimental evaluation shows that Boyi can 1) accurately determine the right execution model, and 2) greatly reduce the exploration space of conventional optimization methods.
AB - FPGA vendors provide OpenCL software development kits for easier programmability, with the goal of replacing the time-consuming and error-prone register-transfer level (RTL) programming. Many studies explore optimization methods (e.g., loop unrolling, local memory) to accelerate OpenCL programs running on FPGAs. These programs typically follow the default OpenCL execution model, where a kernel deploys multiple work-items arranged into workgroups. However, the default execution model is not always a good fit for an application mapped to the FPGA architecture, which is very different from the multithreaded architecture of GPUs, for which OpenCL was originally designed. In this work, we identify three other execution models that can better utilize the FPGA resources for the OpenCL applications that do not fit well into the default execution model. These three execution models are based on two OpenCL features devised for FPGA programming (namely, single work-item kernel and OpenCL channel).We observe that the selection of the right execution model determines the performance upper bound of a particular application, which can vary by two orders magnitude between the most suitable execution model and the most unsuitable one. However, there is no way to select the most suitable execution model other than empirically exploring the optimization space for the four of them, which can be prohibitive. To help FPGA programmers identify the right execution model, we propose Boyi, a systematic framework that makes automatic decisions by analyzing OpenCL programming patterns in an application. After finding the right execution model with the help of Boyi, programmers can apply other conventional optimizations to reach the performance upper bound. Our experimental evaluation shows that Boyi can 1) accurately determine the right execution model, and 2) greatly reduce the exploration space of conventional optimization methods.
KW - Execution Model
KW - FPGA
KW - OpenCL
KW - Programmability
UR - http://www.scopus.com/inward/record.url?scp=85082031951&partnerID=8YFLogxK
U2 - 10.1145/3373087.3375313
DO - 10.1145/3373087.3375313
M3 - Conference article published in proceeding or book
AN - SCOPUS:85082031951
T3 - FPGA 2020 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
SP - 299
EP - 309
BT - FPGA 2020 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
PB - Association for Computing Machinery, Inc
T2 - 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2020
Y2 - 23 February 2020 through 25 February 2020
ER -