TY - GEN
T1 - Dual Memory Networks
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
AU - Zhang, Yabin
AU - Zhu, Wenjie
AU - Tang, Hui
AU - Ma, Zhiyuan
AU - Zhou, Kaiyang
AU - Zhang, Lei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - With the emergence of pre-trained vision-language models like CLIP, how to adapt them to various downstream classification tasks has garnered significant attention in re-cent research. The adaptation strategies can be typically categorized into three paradigms: zero-shot adaptation, few-shot adaptation, and the recently-proposed training-free few-shot adaptation. Most existing approaches are tai-lored for a specific setting and can only cater to one or two of these paradigms. In this paper, we introduce a versa-tile adaptation approach that can effectively work under all three settings. Specifically, we propose the dual memory networks that comprise dynamic and static memory components. The static memory caches training data knowledge, enabling training-free few-shot adaptation, while the dynamic memory preserves historical test features online during the testing process, allowing for the exploration of additional data insights beyond the training set. This novel capability enhances model performance in the few-shot setting and enables model usability in the absence of training data. The two memory networks employ the same flexible memory interactive strategy, which can operate in a training-free mode and can be further enhanced by in-corporating learnable projection layers. Our approach is tested across 11 datasets under the three task settings. Re-markably, in the zero-shot scenario, it outperforms existing methods by over 3% and even shows superior results against methods utilizing external training data. Addition-ally, our method exhibits robust performance against nat-ural distribution shifts. Codes are available at https://github.com/YBZh/DMN.
AB - With the emergence of pre-trained vision-language models like CLIP, how to adapt them to various downstream classification tasks has garnered significant attention in re-cent research. The adaptation strategies can be typically categorized into three paradigms: zero-shot adaptation, few-shot adaptation, and the recently-proposed training-free few-shot adaptation. Most existing approaches are tai-lored for a specific setting and can only cater to one or two of these paradigms. In this paper, we introduce a versa-tile adaptation approach that can effectively work under all three settings. Specifically, we propose the dual memory networks that comprise dynamic and static memory components. The static memory caches training data knowledge, enabling training-free few-shot adaptation, while the dynamic memory preserves historical test features online during the testing process, allowing for the exploration of additional data insights beyond the training set. This novel capability enhances model performance in the few-shot setting and enables model usability in the absence of training data. The two memory networks employ the same flexible memory interactive strategy, which can operate in a training-free mode and can be further enhanced by in-corporating learnable projection layers. Our approach is tested across 11 datasets under the three task settings. Re-markably, in the zero-shot scenario, it outperforms existing methods by over 3% and even shows superior results against methods utilizing external training data. Addition-ally, our method exhibits robust performance against nat-ural distribution shifts. Codes are available at https://github.com/YBZh/DMN.
KW - dual memory networks
KW - versatile adaptation
KW - Vision-language models
UR - https://www.scopus.com/pages/publications/85193305632
U2 - 10.1109/CVPR52733.2024.02713
DO - 10.1109/CVPR52733.2024.02713
M3 - Conference article published in proceeding or book
AN - SCOPUS:85193305632
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 28718
EP - 28728
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -