TY - JOUR
T1 - Preserving User Privacy For Machine Learning: Local Differential Privacy or Federated Machine Learning?
AU - Zheng, Huadi
AU - Hu, Haibo
AU - Han, Ziyang
N1 - Funding Information:
This work was supported in part by National Natural Science Foundation of China (Grant No: U1636205, 61572413) and in part by the Research Grants Council, Hong Kong SAR, China (Grant No: 15238116, 15222118, C1008-16G, and 15218919).
Publisher Copyright:
© 2001-2011 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - The growing number of mobile and IoT devices has nourished many intelligent applications. In order to produce high-quality machine learning models, they constantly access and collect rich personal data such as photos, browsing history, and text messages. However, direct access to personal data has raised increasing public concerns about privacy risks and security breaches. To address these concerns, there are two emerging solutions to privacy-preserving machine learning, namely local differential privacy and federated machine learning. The former is a distributed data collection strategy where each client perturbs data locally before submitting to the server, whereas the latter is a distributed machine learning strategy to train models on mobile devices locally and merge their output (e.g., parameter updates of a model) through a control protocol. In this article, we conduct a comparative study on the efficiency and privacy of both solutions. Our results show that in a standard population and domain setting, both can achieve an optimal misclassification rate lower than 20% and federated machine learning generally performs better at the cost of higher client CPU usage. Nonetheless, local differential privacy can benefit more from a larger client population ($>$> 1k). As for privacy guarantee, local differential privacy also has flexible control over the data leakage.
AB - The growing number of mobile and IoT devices has nourished many intelligent applications. In order to produce high-quality machine learning models, they constantly access and collect rich personal data such as photos, browsing history, and text messages. However, direct access to personal data has raised increasing public concerns about privacy risks and security breaches. To address these concerns, there are two emerging solutions to privacy-preserving machine learning, namely local differential privacy and federated machine learning. The former is a distributed data collection strategy where each client perturbs data locally before submitting to the server, whereas the latter is a distributed machine learning strategy to train models on mobile devices locally and merge their output (e.g., parameter updates of a model) through a control protocol. In this article, we conduct a comparative study on the efficiency and privacy of both solutions. Our results show that in a standard population and domain setting, both can achieve an optimal misclassification rate lower than 20% and federated machine learning generally performs better at the cost of higher client CPU usage. Nonetheless, local differential privacy can benefit more from a larger client population ($>$> 1k). As for privacy guarantee, local differential privacy also has flexible control over the data leakage.
KW - Data models
KW - Distributed databases
KW - Federated Machine Learning
KW - Local Differential Privacy
KW - Machine learning
KW - Privacy
KW - Servers
UR - http://www.scopus.com/inward/record.url?scp=85089291888&partnerID=8YFLogxK
U2 - 10.1109/MIS.2020.3010335
DO - 10.1109/MIS.2020.3010335
M3 - Journal article
AN - SCOPUS:85089291888
SN - 1541-1672
VL - 35
SP - 5
EP - 14
JO - IEEE Intelligent Systems
JF - IEEE Intelligent Systems
IS - 4
M1 - 9144394
ER -