TY - GEN
T1 - FaPN
T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
AU - Huang, Shihua
AU - Lu, Zhichao
AU - Cheng, Ran
AU - He, Cheng
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Recent advancements in deep neural networks have made remarkable leap-forwards in dense image prediction. However, the issue of feature alignment remains as neglected by most existing approaches for simplicity. Direct pixel addition between upsampled and local features leads to feature maps with misaligned contexts that, in turn, translate to mis-classifications in prediction, especially on object boundaries. In this paper, we propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled higher-level features; and another feature selection module to emphasize the lower-level features with rich spatial details. We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN). Extensive experimental evaluations on four dense prediction tasks and four datasets have demonstrated the efficacy of FaPN, yielding an overall improvement of 1.2 - 2.6 points in AP/mIoU over FPN when paired with Faster/Mask R-CNN. In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within MaskFormer. The code is available from https://github.com/EMIGroup/FaPN.
AB - Recent advancements in deep neural networks have made remarkable leap-forwards in dense image prediction. However, the issue of feature alignment remains as neglected by most existing approaches for simplicity. Direct pixel addition between upsampled and local features leads to feature maps with misaligned contexts that, in turn, translate to mis-classifications in prediction, especially on object boundaries. In this paper, we propose a feature alignment module that learns transformation offsets of pixels to contextually align upsampled higher-level features; and another feature selection module to emphasize the lower-level features with rich spatial details. We then integrate these two modules in a top-down pyramidal architecture and present the Feature-aligned Pyramid Network (FaPN). Extensive experimental evaluations on four dense prediction tasks and four datasets have demonstrated the efficacy of FaPN, yielding an overall improvement of 1.2 - 2.6 points in AP/mIoU over FPN when paired with Faster/Mask R-CNN. In particular, our FaPN achieves the state-of-the-art of 56.7% mIoU on ADE20K when integrated within MaskFormer. The code is available from https://github.com/EMIGroup/FaPN.
UR - https://www.scopus.com/pages/publications/85121798784
U2 - 10.1109/ICCV48922.2021.00090
DO - 10.1109/ICCV48922.2021.00090
M3 - Conference article published in proceeding or book
AN - SCOPUS:85121798784
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 844
EP - 853
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 October 2021 through 17 October 2021
ER -