TY - GEN
T1 - PrivKV
T2 - 40th IEEE Symposium on Security and Privacy, SP 2019
AU - Ye, Qingqing
AU - Hu, Haibo
AU - Meng, Xiaofeng
AU - Zheng, Huadi
PY - 2019/5/19
Y1 - 2019/5/19
N2 - Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same 'perturbation-calibration' paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.
AB - Local differential privacy (LDP), where each user perturbs her data locally before sending to an untrusted data collector, is a new and promising technique for privacy-preserving distributed data collection. The advantage of LDP is to enable the collector to obtain accurate statistical estimation on sensitive user data (e.g., location and app usage) without accessing them. However, existing work on LDP is limited to simple data types, such as categorical, numerical, and set-valued data. To the best of our knowledge, there is no existing LDP work on key-value data, which is an extremely popular NoSQL data model and the generalized form of set-valued and numerical data. In this paper, we study this problem of frequency and mean estimation on key-value data by first designing a baseline approach PrivKV within the same 'perturbation-calibration' paradigm as existing LDP techniques. To address the poor estimation accuracy due to the clueless perturbation of users, we then propose two iterative solutions PrivKVM and PrivKVM+ that can gradually improve the estimation results through a series of iterations. An optimization strategy is also presented to reduce network latency and increase estimation accuracy by introducing virtual iterations in the collector side without user involvement. We verify the correctness and effectiveness of these solutions through theoretical analysis and extensive experimental results.
KW - Data-perturbation
KW - Key-value-data
KW - Local-differential-privacy
UR - http://www.scopus.com/inward/record.url?scp=85072910706&partnerID=8YFLogxK
U2 - 10.1109/SP.2019.00018
DO - 10.1109/SP.2019.00018
M3 - Conference article published in proceeding or book
AN - SCOPUS:85072910706
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 317
EP - 331
BT - Proceedings - 2019 IEEE Symposium on Security and Privacy, SP 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 May 2019 through 23 May 2019
ER -