TY - JOUR
T1 - Effective stabilized self-training on few-labeled graph data
AU - Zhou, Ziang
AU - Shi, Jieming
AU - Zhang, Shengzhong
AU - Huang, Zengfeng
AU - Li, Qing
N1 - Funding Information:
This work is supported by Hong Kong RGC ECS (No. 25201221 ), RGC GRF (No. 15200021 ), and PolyU Start-up Fund ( P0033898 ). Zengfeng Huang is supported by National Natural Science Foundation of China No. 62276066 , No. U2241212 . This work is also supported by funding from Tencent Technology Co., Ltd ( P0039546 ).
Funding Information:
This work is supported by Hong Kong RGC ECS (No. 25201221), RGC GRF (No. 15200021), and PolyU Start-up Fund (P0033898). Zengfeng Huang is supported by National Natural Science Foundation of China No. 62276066, No. U2241212. This work is also supported by funding from Tencent Technology Co. Ltd (P0039546).
Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/6
Y1 - 2023/6
N2 - Graph neural networks (GNNs) are designed for semi-supervised node classification on graphs where only a subset of nodes have class labels. However, under extreme cases when very few labels are available (e.g., 1 labeled node per class), GNNs suffer from severe performance degradation. Specifically, we observe that existing GNNs suffer from unstable training process on few-labeled graphs, resulting to inferior performance on node classification. Therefore, we propose an effective framework, Stabilized Self-Training (SST), which is applicable to existing GNNs to handle the scarcity of labeled data, and consequently, boost classification accuracy. We conduct thorough empirical and theoretical analysis to support our findings and motivate the algorithmic designs in SST. We apply SST to two popular GNN models GCN and DAGNN, to get SSTGCN and SSTDA methods respectively, and evaluate the two methods against 10 competitors over 5 benchmarking datasets. Extensive experiments show that the proposed SST framework is highly effective, especially when few labeled data are available. Our methods achieve superior performance under almost all settings over all datasets. For instance, on a Cora dataset with only 1 labeled node per class, the accuracy of SSTGCN is 62.5%, 17.9% higher than GCN, and the accuracy of SSTDA is 66.4%, which outperforms DAGNN by 6.6%.
AB - Graph neural networks (GNNs) are designed for semi-supervised node classification on graphs where only a subset of nodes have class labels. However, under extreme cases when very few labels are available (e.g., 1 labeled node per class), GNNs suffer from severe performance degradation. Specifically, we observe that existing GNNs suffer from unstable training process on few-labeled graphs, resulting to inferior performance on node classification. Therefore, we propose an effective framework, Stabilized Self-Training (SST), which is applicable to existing GNNs to handle the scarcity of labeled data, and consequently, boost classification accuracy. We conduct thorough empirical and theoretical analysis to support our findings and motivate the algorithmic designs in SST. We apply SST to two popular GNN models GCN and DAGNN, to get SSTGCN and SSTDA methods respectively, and evaluate the two methods against 10 competitors over 5 benchmarking datasets. Extensive experiments show that the proposed SST framework is highly effective, especially when few labeled data are available. Our methods achieve superior performance under almost all settings over all datasets. For instance, on a Cora dataset with only 1 labeled node per class, the accuracy of SSTGCN is 62.5%, 17.9% higher than GCN, and the accuracy of SSTDA is 66.4%, which outperforms DAGNN by 6.6%.
KW - Few-labeled graphs
KW - Graph neural networks
KW - Node classification
KW - Self-training
UR - http://www.scopus.com/inward/record.url?scp=85149437884&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.02.032
DO - 10.1016/j.ins.2023.02.032
M3 - Journal article
AN - SCOPUS:85149437884
SN - 0020-0255
VL - 631
SP - 369
EP - 384
JO - Information Sciences
JF - Information Sciences
ER -