TY - JOUR
T1 - Tracking phishing on Ethereum: Transaction network embedding approach for accounts representation learning
AU - Lin, Zhutian
AU - Xiao, Xi
AU - Hu, Guangwu
AU - Li, Qing
AU - Zhang, Bin
AU - Luo, Xiapu
PY - 2023/9
Y1 - 2023/9
N2 - The transaction volume of Ethereum has been witnessing a year-on-year increase, which has unfortunately been accompanied by significant losses due to phishing scams. To enhance the ability of downstream classifiers to distinguish phishing accounts more effectively, we produce dense representations of Ethereum accounts in latent space, leveraging the transaction network topology and associated statistical features. However, the task of learning representations from sparse yet voluminous transaction records presents a significant challenge. To address this, we introduce the Temporal-based Sequences Generator (TSG) and the Heterogeneous-based Sequences Generator (HSG). These generators create sequences from the transaction network, optimizing the use of transaction temporal constraints, diverse account types, and transaction amounts. Our method aims to capture latent higher-order information and generate dense vectors using a network embedding technique. Furthermore, we propose a novel Statistics-Based Sampling (SBS) method to mitigate label leakage. We validate our approach through experiments with various classic downstream classifiers, demonstrating that Phish2vec surpasses other comparative methods in performance and exhibits robustness and stability.
AB - The transaction volume of Ethereum has been witnessing a year-on-year increase, which has unfortunately been accompanied by significant losses due to phishing scams. To enhance the ability of downstream classifiers to distinguish phishing accounts more effectively, we produce dense representations of Ethereum accounts in latent space, leveraging the transaction network topology and associated statistical features. However, the task of learning representations from sparse yet voluminous transaction records presents a significant challenge. To address this, we introduce the Temporal-based Sequences Generator (TSG) and the Heterogeneous-based Sequences Generator (HSG). These generators create sequences from the transaction network, optimizing the use of transaction temporal constraints, diverse account types, and transaction amounts. Our method aims to capture latent higher-order information and generate dense vectors using a network embedding technique. Furthermore, we propose a novel Statistics-Based Sampling (SBS) method to mitigate label leakage. We validate our approach through experiments with various classic downstream classifiers, demonstrating that Phish2vec surpasses other comparative methods in performance and exhibits robustness and stability.
M3 - Journal article
SN - 0167-4048
VL - 135
SP - 1
EP - 14
JO - Computers and Security
JF - Computers and Security
IS - 103479
ER -