TY - GEN
T1 - Learning privately: Privacy-preserving canonical correlation analysis for cross-media retrieval
AU - Wang, Qian
AU - Hu, Shengshan
AU - Du, Minxin
AU - Wang, Jingjun
AU - Ren, Kui
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/10/2
Y1 - 2017/10/2
N2 - A massive explosion of various types of data has been triggered in the 'Big Data' era. In big data systems, machine learning plays an important role due to its effectiveness in discovering hidden information and valuable knowledge. Data privacy, however, becomes an unavoidable concern since big data usually involve multiple organizations, e.g., different healthcare systems and hospitals, who are not in the same trust domain and may be reluctant to share their data publicly. Applying traditional cryptographic tools is a straightforward approach to protect sensitive information, but it often renders learning algorithms useless inevitably. In this work, we, for the first time, propose a novel privacy-preserving scheme for canonical correlation analysis (CCA), which is a well-known learning technique and has been widely used in cross-media retrieval system. We first develop a library of building blocks to support various arithmetics over encrypted real numbers by leveraging additively homomorphic encryption and garbled circuits. Then we encrypt private data by randomly splitting the numerical data, formalize CCA problem and reduce it to a symmetric eigenvalue problem by designing new protocols for privacy-preserving QR decomposition. Finally, we solve all the eigenvalues and the corresponding eigenvectors by running Newton-Raphson method and inverse power method over the ciphertext domain. We carefully analyze the security and extensively evaluate the effectiveness of our design. The results show that our scheme is practically secure, incurs negligible errors compared with performing CCA in the clear and performs comparably in cross-media retrieval systems.
AB - A massive explosion of various types of data has been triggered in the 'Big Data' era. In big data systems, machine learning plays an important role due to its effectiveness in discovering hidden information and valuable knowledge. Data privacy, however, becomes an unavoidable concern since big data usually involve multiple organizations, e.g., different healthcare systems and hospitals, who are not in the same trust domain and may be reluctant to share their data publicly. Applying traditional cryptographic tools is a straightforward approach to protect sensitive information, but it often renders learning algorithms useless inevitably. In this work, we, for the first time, propose a novel privacy-preserving scheme for canonical correlation analysis (CCA), which is a well-known learning technique and has been widely used in cross-media retrieval system. We first develop a library of building blocks to support various arithmetics over encrypted real numbers by leveraging additively homomorphic encryption and garbled circuits. Then we encrypt private data by randomly splitting the numerical data, formalize CCA problem and reduce it to a symmetric eigenvalue problem by designing new protocols for privacy-preserving QR decomposition. Finally, we solve all the eigenvalues and the corresponding eigenvectors by running Newton-Raphson method and inverse power method over the ciphertext domain. We carefully analyze the security and extensively evaluate the effectiveness of our design. The results show that our scheme is practically secure, incurs negligible errors compared with performing CCA in the clear and performs comparably in cross-media retrieval systems.
UR - http://www.scopus.com/inward/record.url?scp=85034033105&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2017.8056955
DO - 10.1109/INFOCOM.2017.8056955
M3 - Conference article published in proceeding or book
AN - SCOPUS:85034033105
T3 - Proceedings - IEEE INFOCOM
BT - INFOCOM 2017 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE Conference on Computer Communications, INFOCOM 2017
Y2 - 1 May 2017 through 4 May 2017
ER -