SEDML: Securely and efficiently harnessing distributed knowledge in machine learning

Yansong Gao, Qun Li, Yifeng Zheng, Guohong Wang, Jiannan Wei, Mang Su

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Training high-performing machine learning models require a rich amount of data which is usually distributed among multiple data sources in practice. Simply centralizing these multi-sourced data for training would raise critical security and privacy concerns, and might be prohibited given the increasingly strict data regulations. To resolve the tension between privacy and data utilization in distributed learning, a machine learning framework called private aggregation of teacher ensembles (PATE) has been recently proposed. PATE harnesses the knowledge (label predictions for an unlabeled dataset) from distributed teacher models to train a student model, obviating access to distributed datasets. Despite being enticing, PATE does not offer protection for the individual label predictions from teacher models, which still entails privacy risks. In this paper, we propose SEDML, a new protocol which allows to securely and efficiently harness the distributed knowledge in machine learning. SEDML builds on lightweight cryptography and provides strong protection for the individual label predictions, as well as differential privacy guarantees on the aggregation results. Extensive evaluations show that while providing privacy protection, SEDML preserves the accuracy as in the plaintext baseline. Meanwhile, SEDML outperforms the state-of-the-art work of Xiang et al. (ICDCS'20) by 43× in computation and 1.23× in communication.

Original languageEnglish
Article number102857
JournalComputers and Security
Volume121
DOIs
Publication statusPublished - Oct 2022

Keywords

  • Differential privacy
  • Distributed learning
  • Knowledge transfer
  • Privacy protection
  • Secure computation

ASJC Scopus subject areas

  • General Computer Science
  • Law

Fingerprint

Dive into the research topics of 'SEDML: Securely and efficiently harnessing distributed knowledge in machine learning'. Together they form a unique fingerprint.

Cite this