TY - GEN
T1 - MER-Inspector: Assessing Model Extraction Risks from An Attack-Agnostic Perspective
AU - Zhang, Xinwei
AU - Hu, Haibo
AU - Ye, Qingqing
AU - Bai, Li
AU - Zheng, Huadi
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/4
Y1 - 2025/4
N2 - Information leakage issues in machine learning-based Web applications have attracted increasing attention. While the risk of data privacy leakage has been rigorously analyzed, the theory of model function leakage, known as Model Extraction Attacks (MEAs), has not been well studied. In this paper, we are the first to understand MEAs theoretically from an attack-agnostic perspective and to propose analytical metrics for evaluating model extraction risks. By using the Neural Tangent Kernel (NTK) theory, we formulate the linearized MEA as a regularized kernel classification problem and then derive the fidelity gap and generalization error bounds of the attack performance. Based on these theoretical analyses, we propose a new theoretical metric called Model Recovery Complexity (MRC), which measures the distance of weight changes between the victim and surrogate models to quantify risk. Additionally, we find that victim model accuracy, which shows a strong positive correlation with model extraction risk, can serve as an empirical metric. By integrating these two metrics, we propose a framework, namely Model Extraction Risk Inspector (MER-Inspector), to compare the extraction risks of models under different model architectures by utilizing relative metric values. We conduct extensive experiments on 16 model architectures and 5 datasets. The experimental results demonstrate that the proposed metrics have a high correlation with model extraction risks, and MER-Inspector can accurately compare the extraction risks of any two models with up to 89.58%.
AB - Information leakage issues in machine learning-based Web applications have attracted increasing attention. While the risk of data privacy leakage has been rigorously analyzed, the theory of model function leakage, known as Model Extraction Attacks (MEAs), has not been well studied. In this paper, we are the first to understand MEAs theoretically from an attack-agnostic perspective and to propose analytical metrics for evaluating model extraction risks. By using the Neural Tangent Kernel (NTK) theory, we formulate the linearized MEA as a regularized kernel classification problem and then derive the fidelity gap and generalization error bounds of the attack performance. Based on these theoretical analyses, we propose a new theoretical metric called Model Recovery Complexity (MRC), which measures the distance of weight changes between the victim and surrogate models to quantify risk. Additionally, we find that victim model accuracy, which shows a strong positive correlation with model extraction risk, can serve as an empirical metric. By integrating these two metrics, we propose a framework, namely Model Extraction Risk Inspector (MER-Inspector), to compare the extraction risks of models under different model architectures by utilizing relative metric values. We conduct extensive experiments on 16 model architectures and 5 datasets. The experimental results demonstrate that the proposed metrics have a high correlation with model extraction risks, and MER-Inspector can accurately compare the extraction risks of any two models with up to 89.58%.
KW - Model extraction attacks
KW - model extraction risk
KW - neural tangent kernel
KW - risk measure
UR - https://www.scopus.com/pages/publications/105005137677
U2 - 10.1145/3696410.3714894
DO - 10.1145/3696410.3714894
M3 - Conference article published in proceeding or book
AN - SCOPUS:105005137677
T3 - WWW 2025 - Proceedings of the ACM Web Conference
SP - 4300
EP - 4315
BT - WWW 2025 - Proceedings of the ACM Web Conference
PB - Association for Computing Machinery, Inc
T2 - 34th ACM Web Conference, WWW 2025
Y2 - 28 April 2025 through 2 May 2025
ER -