TY - JOUR
T1 - Ocelot
T2 - An Interactive, Efficient Distributed Compression-As-a-Service Platform With Optimized Data Compression Techniques
AU - Liu, Yuanjian
AU - Di, Sheng
AU - Huang, Jiajun
AU - Zhang, Zhaorui
AU - Chard, Kyle
AU - Foster, Ian
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Large volumes of data generated by scientific simulations, genome sequencing, and other applications need to be moved among clusters for data collection/analysis. Data compression techniques have effectively reduced data storage and transfer costs. However, users' requirements on interactively controlling both data quality and compression ratios are non-trivial to fulfill. We propose a novel Compression-as-a-Service (CaaS) platform called Ocelot with four important contributions: (1) It offers real-time visualization, interactive compression, and transfer of scientific datasets. (2) It incorporates new strategies for compressing diverse types of datasets more effectively than traditional methods. (3) It provides an effective method for estimating the compression ratio and execution time of compression tasks. (4) Experiments on multiple real-world datasets on geographically distributed computers show that Ocelot can significantly improve data transfer efficiency with a performance gain of more than 10x in computing clusters with relatively slow networks.
AB - Large volumes of data generated by scientific simulations, genome sequencing, and other applications need to be moved among clusters for data collection/analysis. Data compression techniques have effectively reduced data storage and transfer costs. However, users' requirements on interactively controlling both data quality and compression ratios are non-trivial to fulfill. We propose a novel Compression-as-a-Service (CaaS) platform called Ocelot with four important contributions: (1) It offers real-time visualization, interactive compression, and transfer of scientific datasets. (2) It incorporates new strategies for compressing diverse types of datasets more effectively than traditional methods. (3) It provides an effective method for estimating the compression ratio and execution time of compression tasks. (4) Experiments on multiple real-world datasets on geographically distributed computers show that Ocelot can significantly improve data transfer efficiency with a performance gain of more than 10x in computing clusters with relatively slow networks.
KW - Compression as a service (CaaS)
KW - data transfer
KW - floating-point tensor compression
KW - genome sequence compression
UR - http://www.scopus.com/inward/record.url?scp=105005791073&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2025.3568221
DO - 10.1109/TPDS.2025.3568221
M3 - Journal article
AN - SCOPUS:105005791073
SN - 1045-9219
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
ER -