TY - GEN
T1 - A Compiler-Like Framework for Optimizing Cryptographic Big Integer Multiplication on GPUs
AU - Ji, Zhuoran
AU - Zhao, Jianyu
AU - Zhang, Zhaorui
AU - Xu, Jiming
AU - Yan, Shoumeng
AU - Ju, Lei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - With the growth of digital data and rising security concerns, techniques for privacy-preserving computation have become increasingly essential. Big integer multiplication, pivotal for these applications, is compute-intensive but poses challenges for GPU acceleration due to its complexity and the need for application-specific tailored implementations. This paper presents IMCompiler, a compiler-like framework that automatically gen-erates optimized GPU kernels for integer multiplications used in cryptosystems. It features a frontend-IR-backend structure, where the Intermediate Representation (IR) employs a segmented integer multiplication algorithm to decouple architecture-specific optimizations from high-level parameters. The frontend can then easily translate integer multiplication with various high-level parameters into the IR, while the backend focuses on fine-tuning a single GPU kernel for each device, enabling automatic code generation. Moreover, we introduce a computation diagram to facilitate the analysis of parallelization strategies, inspiring many optimizations, including two-dimensional parallelization, tailored caching strategy, index transposing, and lazy carrying. Experiments show that IMCompiler achieves a 4.47× speedup compared to the widely used baseline and 1.42 × over Nvidia's official library. The speedup will be even higher for larger integers and higher-capacity GPUs.
AB - With the growth of digital data and rising security concerns, techniques for privacy-preserving computation have become increasingly essential. Big integer multiplication, pivotal for these applications, is compute-intensive but poses challenges for GPU acceleration due to its complexity and the need for application-specific tailored implementations. This paper presents IMCompiler, a compiler-like framework that automatically gen-erates optimized GPU kernels for integer multiplications used in cryptosystems. It features a frontend-IR-backend structure, where the Intermediate Representation (IR) employs a segmented integer multiplication algorithm to decouple architecture-specific optimizations from high-level parameters. The frontend can then easily translate integer multiplication with various high-level parameters into the IR, while the backend focuses on fine-tuning a single GPU kernel for each device, enabling automatic code generation. Moreover, we introduce a computation diagram to facilitate the analysis of parallelization strategies, inspiring many optimizations, including two-dimensional parallelization, tailored caching strategy, index transposing, and lazy carrying. Experiments show that IMCompiler achieves a 4.47× speedup compared to the widely used baseline and 1.42 × over Nvidia's official library. The speedup will be even higher for larger integers and higher-capacity GPUs.
KW - big integer
KW - compiler
KW - cryptography
KW - GPU
UR - https://www.scopus.com/pages/publications/85213371575
U2 - 10.1109/MICRO61859.2024.00036
DO - 10.1109/MICRO61859.2024.00036
M3 - Conference article published in proceeding or book
AN - SCOPUS:85213371575
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 380
EP - 392
BT - Proceedings - 2024 57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
PB - IEEE Computer Society
T2 - 57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
Y2 - 2 November 2024 through 6 November 2024
ER -