TY - JOUR
T1 - Joint learning of frequency and spatial domains for dense image prediction
AU - Jia, Shaocheng
AU - Yao, Wei
N1 - Funding Information:
The work described in this paper was supported by the National Natural Science Foundation of China (Project No. 42171361). This work is also funded by the research project (Project Number: 2021.A6.184.21D) of the Public Policy Research Funding Scheme of The Government of the Hong Kong Special Administrative Region . The authors would like to thank the anonymous reviewers for their careful reading of our manuscript and their insightful comments and suggestions.
Funding Information:
The work described in this paper was supported by the National Natural Science Foundation of China (Project No. 42171361). This work is also funded by the research project (Project Number: 2021.A6.184.21D) of the Public Policy Research Funding Scheme of The Government of the Hong Kong Special Administrative Region. The authors would like to thank the anonymous reviewers for their careful reading of our manuscript and their insightful comments and suggestions.
Publisher Copyright:
© 2022 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)
PY - 2023/1
Y1 - 2023/1
N2 - Current artificial neural networks mainly conduct the learning process in the spatial domain but neglect the frequency domain learning. However, the learning course performed in the frequency domain can be more efficient than that in the spatial domain. In this paper, we fully explore frequency domain learning and propose a joint learning paradigm of frequency and spatial domains. This paradigm can take full advantage of the combined preponderances of frequency learning and spatial learning; specifically, frequency and spatial domain learning can effectively capture intrinsic global and local information, respectively. To achieve this, an innovative but effective linear learning block is proposed to conduct the learning process directly in the frequency domain. Together with the prevailing spatial learning operation, i.e., convolution, a powerful and scalable joint learning framework is further proposed. Exhaustive experiments on the diverse Benchmark datasets — KITTI, Make3D, and Cityscapes demonstrate the effectiveness and superiority of the proposed joint learning paradigm in dense image prediction tasks, including self-supervised depth estimation, ego-motion estimation, and semantic segmentation. In particular, the proposed model can achieve performance competitive to those of state-of-the-art methods in all three tasks, even without pretraining. Moreover, the proposed model reduces the number of parameters by over 78% for self-supervised depth estimation on the KITTI dataset while retaining the time complexity on par with other state-of-the-art methods; this provides a great chance to develop real-world applications. We hope that the proposed method can encourage more research in cross-domain learning. The codes are publicly available at https://github.com/shaochengJia/FSLNet.
AB - Current artificial neural networks mainly conduct the learning process in the spatial domain but neglect the frequency domain learning. However, the learning course performed in the frequency domain can be more efficient than that in the spatial domain. In this paper, we fully explore frequency domain learning and propose a joint learning paradigm of frequency and spatial domains. This paradigm can take full advantage of the combined preponderances of frequency learning and spatial learning; specifically, frequency and spatial domain learning can effectively capture intrinsic global and local information, respectively. To achieve this, an innovative but effective linear learning block is proposed to conduct the learning process directly in the frequency domain. Together with the prevailing spatial learning operation, i.e., convolution, a powerful and scalable joint learning framework is further proposed. Exhaustive experiments on the diverse Benchmark datasets — KITTI, Make3D, and Cityscapes demonstrate the effectiveness and superiority of the proposed joint learning paradigm in dense image prediction tasks, including self-supervised depth estimation, ego-motion estimation, and semantic segmentation. In particular, the proposed model can achieve performance competitive to those of state-of-the-art methods in all three tasks, even without pretraining. Moreover, the proposed model reduces the number of parameters by over 78% for self-supervised depth estimation on the KITTI dataset while retaining the time complexity on par with other state-of-the-art methods; this provides a great chance to develop real-world applications. We hope that the proposed method can encourage more research in cross-domain learning. The codes are publicly available at https://github.com/shaochengJia/FSLNet.
KW - Depth estimation
KW - Frequency learning
KW - Joint learning
KW - Semantic image segmentation
KW - Spatial learning
UR - http://www.scopus.com/inward/record.url?scp=85141925703&partnerID=8YFLogxK
U2 - 10.1016/j.isprsjprs.2022.11.001
DO - 10.1016/j.isprsjprs.2022.11.001
M3 - Journal article
AN - SCOPUS:85141925703
SN - 0924-2716
VL - 195
SP - 14
EP - 28
JO - ISPRS Journal of Photogrammetry and Remote Sensing
JF - ISPRS Journal of Photogrammetry and Remote Sensing
ER -