FedCSpc: A Cross-Silo Federated Learning System with Error-Bounded Lossy Parameter Compression

Zhaorui Zhang, Sheng Di, Kai Zhao, Sian Jin, Dingwen Tao, Zhuoran Ji, Benben Liu, Khalid Ayed Alharthi, Jiannong Cao, Franck Cappello

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Cross-Silo federated learning is widely used for scaling deep neural network (DNN) training over data silos from different locations worldwide while guaranteeing data privacy. Communication has been identified as the main bottleneck when training large-scale models due to large-volume model parameters and gradient transmission across public networks with limited bandwidth. Most previous works focus on gradient compression, while limited work tries to compress parameters that can not be ignored and extremely affect communication performance during the training. To bridge this gap, we propose FedCSpc: an efficient cross-silo federated learning system with an XAI-driven adaptive parameter compression strategy for large-scale model training. Our work substantially differs from existing gradient compression techniques due to the distinct data features of gradient and parameter. The key contributions of this paper are fourfold. (1) Our designed FedCSpc proposes to compress the parameter during the training using the state-of-the-art error-bounded lossy compressor – SZ3. (2) We develop an adaptive compression error bound adjustment algorithm to guarantee the model accuracy effectively. (3) We exploit an efficient approach to utilize the idle CPU resources of clients to compress the parameters. (4) We perform a comprehensive evaluation with a wide range of models and benchmarks on a GPU cluster with 65 GPUs. Results show that FedCSpc can achieve the same model accuracy as FedAvg while reducing the data volume of parameters and gradients in communication by up to 7.39× and 288×, respectively. With 32 clients on a 4Gb size model, FedCSpc significantly outperforms FedAvg in wall-clock time in the emulated WAN environment (at the bandwidth of 1 Gbps or lower without loss of generality).

Original languageEnglish
Article number0b00006493dfd3c4
JournalIEEE Transactions on Parallel and Distributed Systems
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Cross-Silo Federated Learning System
  • Error-Bounded Lossy Compression
  • Parameter Compression
  • SZ3
  • XAI

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'FedCSpc: A Cross-Silo Federated Learning System with Error-Bounded Lossy Parameter Compression'. Together they form a unique fingerprint.

Cite this