TY - GEN
T1 - Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization
AU - Cai, Sijia
AU - Zuo, Wangmeng
AU - Zhang, Lei
PY - 2017/12/22
Y1 - 2017/12/22
N2 - The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.
AB - The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.
UR - http://www.scopus.com/inward/record.url?scp=85041912145&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2017.63
DO - 10.1109/ICCV.2017.63
M3 - Conference article published in proceeding or book
AN - SCOPUS:85041912145
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 511
EP - 520
BT - Proceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Conference on Computer Vision, ICCV 2017
Y2 - 22 October 2017 through 29 October 2017
ER -