TY - GEN
T1 - Is noise always harmful? Visual learning from weakly-related data
AU - Zhong, Sheng Hua
AU - Liu, Yan
AU - Hua, Kien A.
AU - Wu, Songtao
PY - 2016/6/22
Y1 - 2016/6/22
N2 - Noise exists universally in multimedia data, especially in Internet era. For example, tags from web users are often incomplete, arbitrary, and low relevant with the visual information. Intuitively, noise in the dataset is harmful to learning tasks, which implies that huge volumes of image tags from social media can't be utilized directly. To collect the reliable training dataset, labor-intensive manual labeling and various learning based outlier detection techniques are widely used. This paper intends to discuss whether such kind of preprocessing is always needed. We focus on a very normal case in image classification that the available dataset includes a large amount of images weakly related to any target classes. We use deep models as the platform and design a series of experiments to compare the semi-supervised learning performance with/without weakly related unlabeled data. Fortunately, we validate that weakly related data is not always harmful, which is an encouraging finding for research on web image learning.
AB - Noise exists universally in multimedia data, especially in Internet era. For example, tags from web users are often incomplete, arbitrary, and low relevant with the visual information. Intuitively, noise in the dataset is harmful to learning tasks, which implies that huge volumes of image tags from social media can't be utilized directly. To collect the reliable training dataset, labor-intensive manual labeling and various learning based outlier detection techniques are widely used. This paper intends to discuss whether such kind of preprocessing is always needed. We focus on a very normal case in image classification that the available dataset includes a large amount of images weakly related to any target classes. We use deep models as the platform and design a series of experiments to compare the semi-supervised learning performance with/without weakly related unlabeled data. Fortunately, we validate that weakly related data is not always harmful, which is an encouraging finding for research on web image learning.
KW - deep learning
KW - semi-supervised learning
KW - Weakly-related data
UR - http://www.scopus.com/inward/record.url?scp=84980349667&partnerID=8YFLogxK
U2 - 10.1109/ICOT.2015.7498518
DO - 10.1109/ICOT.2015.7498518
M3 - Conference article published in proceeding or book
AN - SCOPUS:84980349667
T3 - Proceedings of 2015 International Conference on Orange Technologies, ICOT 2015
SP - 181
EP - 184
BT - Proceedings of 2015 International Conference on Orange Technologies, ICOT 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Orange Technologies, ICOT 2015
Y2 - 19 December 2015 through 22 December 2015
ER -