Ocelot: An Interactive, Efficient Distributed Compression-As-a-Service Platform With Optimized Data Compression Techniques

Yuanjian Liu, Sheng Di, Jiajun Huang, Zhaorui Zhang, Kyle Chard, Ian Foster

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Large volumes of data generated by scientific simulations, genome sequencing, and other applications need to be moved among clusters for data collection/analysis. Data compression techniques have effectively reduced data storage and transfer costs. However, users' requirements on interactively controlling both data quality and compression ratios are non-trivial to fulfill. We propose a novel Compression-as-a-Service (CaaS) platform called Ocelot with four important contributions: (1) It offers real-time visualization, interactive compression, and transfer of scientific datasets. (2) It incorporates new strategies for compressing diverse types of datasets more effectively than traditional methods. (3) It provides an effective method for estimating the compression ratio and execution time of compression tasks. (4) Experiments on multiple real-world datasets on geographically distributed computers show that Ocelot can significantly improve data transfer efficiency with a performance gain of more than 10x in computing clusters with relatively slow networks.

Original languageEnglish
JournalIEEE Transactions on Parallel and Distributed Systems
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Compression as a service (CaaS)
  • data transfer
  • floating-point tensor compression
  • genome sequence compression

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Ocelot: An Interactive, Efficient Distributed Compression-As-a-Service Platform With Optimized Data Compression Techniques'. Together they form a unique fingerprint.

Cite this