Abstract
In big data era, companies and organizations are keen to collect data from users and analyse their behaviour patterns to make decisions or predictions for profits. However, it undermines users' privacy because the collected data can be quite sensitive and easy to leak. To address privacy problems, local differential privacy (LDP) has been proposed for untrusted data collectors to obtain statistical information without compromising user privacy. Most studies on LDP assume that all users fully cooperate and contribute to the data collection process and thus the collected dataset is complete. However, in practice, especially when user population is large, such assumption seldom holds due to communication loss, user unresponsiveness or unwillingness, and incomplete user-side data. Unfortunately, state-of-the-art LDP-based data collection schemes, such as GRR, OUE and OLH, cannot handle partial data collection effectively. In this paper, we propose collaborative sampling to address partial data collection in a multi-dimensional setting. Thanks to a two-phase mechanism, we can derive the optimal sampling rate for each dimension. The optimality is shown and proved with respect to the variance of estimated frequency. Besides that, collaborative sampling is general and can be used in GRR, OUE and OLH with minimal adaption. Through experimental results, we show collaborative sampling outperforms existing mainstream data collection schemes in partial multi-dimensional data collection.
Original language | English |
---|---|
Article number | 10160038 |
Pages (from-to) | 3948-3961 |
Number of pages | 14 |
Journal | IEEE Transactions on Information Forensics and Security |
Volume | 18 |
DOIs | |
Publication status | Published - Jun 2023 |
Keywords
- collaborative sampling
- Local differential privacy
- multi-dimensional data
- privacy-preserving data collection
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Computer Networks and Communications