TY - GEN
T1 - Collecting High-Dimensional and Correlation-Constrained Data with Local Differential Privacy
AU - Du, Rong
AU - Ye, Qingqing
AU - Fu, Yue
AU - Hu, Haibo
N1 - Funding Information:
ACKNOWLEDGMENT This work was supported by National Natural Science Foundation of China (Grant No: 62072390), and the Research Grants Council, Hong Kong SAR, China (Grant No: 15238116, 15222118, 15218919, 15203120 and C1008-16G).
Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/6
Y1 - 2021/7/6
N2 - Local differential privacy (LDP) is a promising privacy model for distributed data collection. It has been widely deployed in real-world systems (e.g. Chrome, iOS, macOS). In LDP-based mechanisms, an aggregator collects private values perturbed by each user and then analyses these values to estimate their statistics, such as frequency and mean. Most existing works focus on simple scalar value types, such as boolean and categorical values. However, with the emergence of smart sensors and internet of things, high-dimensional data are gaining increasing popularity. In many cases, correlations exist between various attributes of such data, e.g.Temperature and luminance. To ensure LDP for high-dimensional data, existing solutions either partition the privacy budget ϵ among these correlated attributes or adopt sampling, both of which dilute the density of useful information and thus result in poor data utility.In this paper, we propose a relaxed LDP model, namely, univariate dominance local differential privacy (UDLDP), for high-dimensional data. We quantify the correlations between attributes and present a correlation-bounded perturbation (CBP) mechanism that optimizes the partitioning of privacy budget on each correlated attribute. Furthermore, we extend CBP to support sampling, which is a common bandwidth reduction technique in sensor networks and Internet of Things. We derive the best allocation strategy of sampling probabilities among attributes in terms of data utility, which leads to the correlation-bounded perturbation mechanism with sampling (CBPS). The performance of both mechanisms is evaluated and compared with state-of-The-Art LDP mechanisms on real-world and synthetic datasets.
AB - Local differential privacy (LDP) is a promising privacy model for distributed data collection. It has been widely deployed in real-world systems (e.g. Chrome, iOS, macOS). In LDP-based mechanisms, an aggregator collects private values perturbed by each user and then analyses these values to estimate their statistics, such as frequency and mean. Most existing works focus on simple scalar value types, such as boolean and categorical values. However, with the emergence of smart sensors and internet of things, high-dimensional data are gaining increasing popularity. In many cases, correlations exist between various attributes of such data, e.g.Temperature and luminance. To ensure LDP for high-dimensional data, existing solutions either partition the privacy budget ϵ among these correlated attributes or adopt sampling, both of which dilute the density of useful information and thus result in poor data utility.In this paper, we propose a relaxed LDP model, namely, univariate dominance local differential privacy (UDLDP), for high-dimensional data. We quantify the correlations between attributes and present a correlation-bounded perturbation (CBP) mechanism that optimizes the partitioning of privacy budget on each correlated attribute. Furthermore, we extend CBP to support sampling, which is a common bandwidth reduction technique in sensor networks and Internet of Things. We derive the best allocation strategy of sampling probabilities among attributes in terms of data utility, which leads to the correlation-bounded perturbation mechanism with sampling (CBPS). The performance of both mechanisms is evaluated and compared with state-of-The-Art LDP mechanisms on real-world and synthetic datasets.
UR - http://www.scopus.com/inward/record.url?scp=85111754989&partnerID=8YFLogxK
U2 - 10.1109/SECON52354.2021.9491591
DO - 10.1109/SECON52354.2021.9491591
M3 - Conference article published in proceeding or book
AN - SCOPUS:85111754989
T3 - Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks workshops
SP - 1
EP - 9
BT - 2021 18th IEEE International Conference on Sensing, Communication and Networking, SECON 2021
PB - IEEE Computer Society
T2 - 18th IEEE International Conference on Sensing, Communication and Networking, SECON 2021
Y2 - 6 July 2021 through 9 July 2021
ER -