DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

Minxin Du, Xiang Yue, Sherman S.M. Chow, Tianhao Wang, Chenyu Huang, Huan Sun

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

32 Citations (Scopus)

Abstract

Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails to cover (inference-time) threats like embedding inversion and sensitive attribute inference. It is also costly in storage and computation when used to fine-tune large pre-trained language models (LMs). We propose DP-Forward, which directly perturbs embedding matrices in the forward pass of LMs. It satisfies stringent local DP requirements for training and inference data. To instantiate it using the smallest matrix-valued noise, we devise an analytic matrix Gaussian mechanism (aMGM) by drawing possibly non-i.i.d. noise from a matrix Gaussian distribution. We then investigate perturbing outputs from different hidden (sub-)layers of LMs with aMGM noises. Its utility on three typical tasks almost hits the non-private baseline and outperforms DP-SGD by up to 7.7pp at a moderate privacy level. It saves 3× time and memory costs compared to DP-SGD with the latest high-speed library. It also reduces the average success rates of embedding inversion and sensitive attribute inference by up to 88pp and 41pp, respectively, whereas DP-SGD fails.

Original languageEnglish
Title of host publicationCCS 2023 - Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery, Inc
Pages2665-2679
Number of pages15
ISBN (Electronic)9798400700507
DOIs
Publication statusPublished - 15 Nov 2023
Externally publishedYes
Event30th ACM SIGSAC Conference on Computer and Communications Security, CCS 2023 - Copenhagen, Denmark
Duration: 26 Nov 202330 Nov 2023

Publication series

NameCCS 2023 - Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security

Conference

Conference30th ACM SIGSAC Conference on Computer and Communications Security, CCS 2023
Country/TerritoryDenmark
CityCopenhagen
Period26/11/2330/11/23

Keywords

  • Analytic Matrix Gaussian Mechanism
  • Embedding Matrices
  • Local Differential Privacy
  • Natural Language Processing
  • Pre-trained Language Models
  • Privacy-preserving Fine-tuning and Inference of LMs

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass'. Together they form a unique fingerprint.

Cite this