PPGloVe: Privacy-Preserving GloVe for Training Word Vectors in the Dark

Zhongyun Hua, Yan Tong, Yifeng Zheng, Yuhong Li, Yushu Zhang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Words are treated as atomic units in natural language processing tasks and it is a fundamental step to represent them as vectors for supporting subsequent computations. GloVe is a widely used machine learning model to train word vectors. Generally, a large corpus and high computation resources are required to train high-quality word vectors using GloVe, making it difficult for users to train their own word vectors by themselves. A natural choice nowadays is to outsource the training process to the cloud. However, coming with such cloud-based training services are serious privacy concerns, which should be well addressed. In this paper, we design, implement, and evaluate PPGloVe, the first system framework that supports privacy-preserving word vectors training using GloVe over encrypted data of multiple participants. We first decompose the training task and show that previous privacy-preserving machine learning techniques are not practical for this task. We then construct a new secure training strategy to delicately bridge lightweight cryptographic techniques with GloVe in depth to support privacy-preserving GloVe training on the cloud. By design, the corpora of the participants and the trained word vectors are kept private along the whole training process. Extensive experiments over three datasets of different scales demonstrate that PPGloVe produces word vectors with promising quality comparable to plaintext training, with practically affordable overhead.

Original languageEnglish
Pages (from-to)3644-3658
Number of pages15
JournalIEEE Transactions on Information Forensics and Security
Volume19
DOIs
Publication statusPublished - Feb 2024

Keywords

  • cloud computing
  • data security
  • Privacy preservation
  • word representation

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'PPGloVe: Privacy-Preserving GloVe for Training Word Vectors in the Dark'. Together they form a unique fingerprint.

Cite this