TY - JOUR
T1 - Boosting RGB-D saliency detection by leveraging unlabeled RGB images
AU - Wang, Xiaoqiang
AU - Zhu, Lei
AU - Tang, Siliang
AU - Fu, Huazhu
AU - Li, Ping
AU - Wu, Fei
AU - Yang, Yi
AU - Zhuang, Yueting
N1 - Funding Information:
This work was supported in part by the National Key Research and Development Program of China under Grant 2018AAA0101900; in part by the National Science Foundation (NSF) of Zhejiang under Grant LR21F020004; in part by the Chinese Knowledge Center of Engineering Science and Technology (CKCEST), Hikvision-Zhejiang University Joint Research Center, National Natural Science Foundation of China, under Grant 61902275; and in part by The Hong Kong Polytechnic University under Grant P0030419, Grant P0030929, and Grant P0035358.
Publisher Copyright:
© 1992-2012 IEEE.
PY - 2022/1
Y1 - 2022/1
N2 - Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.
AB - Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection. We first devise a depth decoupling convolutional neural network (DDCNN), which contains a depth estimation branch and a saliency detection branch. The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data. The saliency detection branch is used to fuse the RGB feature and depth feature to predict the RGB-D saliency. Then, the whole DDCNN is assigned as the backbone in a teacher-student framework for semi-supervised learning. Moreover, we also introduce a consistency loss on the intermediate attention and saliency maps for the unlabeled data, as well as a supervised depth and saliency loss for labeled data. Experimental results on seven widely-used benchmark datasets demonstrate that our DDCNN outperforms state-of-the-art methods both quantitatively and qualitatively. We also demonstrate that our semi-supervised DS-Net can further improve the performance, even when using an RGB image with the pseudo depth map.
KW - Attention consistency
KW - Depth estimation
KW - RGB-D salient object detection
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85122853814&partnerID=8YFLogxK
U2 - 10.1109/TIP.2021.3139232
DO - 10.1109/TIP.2021.3139232
M3 - Journal article
C2 - 34990359
AN - SCOPUS:85122853814
SN - 1057-7149
VL - 31
SP - 1107
EP - 1119
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -