TY - GEN
T1 - Accelerating Similarity-based Mining Tasks on High-dimensional Data by Processing-in-memory
AU - Yiu, Man Lung
AU - Wang, Fang
AU - Shao, Zili
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by grant GRF 152018/20E from Hong Kong RGC.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - Similarity computation is a core subroutine of many mining tasks on multi-dimensional data, which are often massive datasets at high dimensionality. In these mining tasks, the performance bottleneck is caused by the 'memory wall' problem as substantial amount of data needs to be transferred from memory to processors. Recent advances in non-volatile memory (NVM) enable processing-in-memory (PIM), which reduces data transfer and thus alleviates the performance bottleneck. Nevertheless, NVM PIM supports specific operations only (e.g., dot-product on non-negative integer vectors) but not arbitrary operations. In this paper, we tackle the above challenge and carefully exploit NVM PIM to accelerate similarity-based mining tasks on multi-dimensional data without compromising the accuracy of results. Experimental results on real datasets show that our proposed method achieves up to 10.5x and 8.5x speedup for state-of-art kNN classification and k-means clustering algorithms, respectively.
AB - Similarity computation is a core subroutine of many mining tasks on multi-dimensional data, which are often massive datasets at high dimensionality. In these mining tasks, the performance bottleneck is caused by the 'memory wall' problem as substantial amount of data needs to be transferred from memory to processors. Recent advances in non-volatile memory (NVM) enable processing-in-memory (PIM), which reduces data transfer and thus alleviates the performance bottleneck. Nevertheless, NVM PIM supports specific operations only (e.g., dot-product on non-negative integer vectors) but not arbitrary operations. In this paper, we tackle the above challenge and carefully exploit NVM PIM to accelerate similarity-based mining tasks on multi-dimensional data without compromising the accuracy of results. Experimental results on real datasets show that our proposed method achieves up to 10.5x and 8.5x speedup for state-of-art kNN classification and k-means clustering algorithms, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85112863748&partnerID=8YFLogxK
U2 - 10.1109/ICDE51399.2021.00167
DO - 10.1109/ICDE51399.2021.00167
M3 - Conference article published in proceeding or book
T3 - Proceedings - International Conference on Data Engineering
SP - 1859
EP - 1864
BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PB - IEEE Computer Society
ER -