TY - GEN
T1 - pFedPrompt: Learning Personalized Prompt for Vision-Language Models in Federated Learning
AU - Guo, Tao
AU - Guo, Song
AU - Wang, Junxiao
N1 - Funding Information:
This research was supported by fundings from the Key-Area Research and Development Program of Guangdong Province (No. 2021B0101400003), Hong Kong RGC Research Impact Fund (No. R5060-19), Areas of Excellence Scheme (No. AoE/E-601/22-R), General Research Fund (No. 152203/20E, 152244/21E, 152169/22E), and Shenzhen Science and Technology Innovation Commission (No. JCYJ20200109142008673).
Publisher Copyright:
© 2023 ACM.
PY - 2023/4/30
Y1 - 2023/4/30
N2 - Pre-trained vision-language models like CLIP show great potential in learning representations that capture latent characteristics of users. A recently proposed method called Contextual Optimization (CoOp) introduces the concept of training prompt for adapting pre-trained vision-language models. Given the lightweight nature of this method, researchers have migrated the paradigm from centralized to decentralized system to innovate the collaborative training framework of Federated Learning (FL). However, current prompt training in FL mainly focuses on modeling user consensus and lacks the adaptation to user characteristics, leaving the personalization of prompt largely under-explored. Researches over the past few years have applied personalized FL (pFL) approaches to customizing models for heterogeneous users. Unfortunately, we find that with the variation of modality and training behavior, directly applying the pFL methods to prompt training leads to insufficient personalization and performance. To bridge the gap, we present pFedPrompt, which leverages the unique advantage of multimodality in vision-language models by learning user consensus from linguistic space and adapting to user characteristics in visual space in a non-parametric manner. Through this dual collaboration, the learned prompt will be fully personalized and aligned to the user's local characteristics. We conduct extensive experiments across various datasets under the FL setting with statistical heterogeneity. The results demonstrate the superiority of our pFedPrompt against the alternative approaches with robust performance.
AB - Pre-trained vision-language models like CLIP show great potential in learning representations that capture latent characteristics of users. A recently proposed method called Contextual Optimization (CoOp) introduces the concept of training prompt for adapting pre-trained vision-language models. Given the lightweight nature of this method, researchers have migrated the paradigm from centralized to decentralized system to innovate the collaborative training framework of Federated Learning (FL). However, current prompt training in FL mainly focuses on modeling user consensus and lacks the adaptation to user characteristics, leaving the personalization of prompt largely under-explored. Researches over the past few years have applied personalized FL (pFL) approaches to customizing models for heterogeneous users. Unfortunately, we find that with the variation of modality and training behavior, directly applying the pFL methods to prompt training leads to insufficient personalization and performance. To bridge the gap, we present pFedPrompt, which leverages the unique advantage of multimodality in vision-language models by learning user consensus from linguistic space and adapting to user characteristics in visual space in a non-parametric manner. Through this dual collaboration, the learned prompt will be fully personalized and aligned to the user's local characteristics. We conduct extensive experiments across various datasets under the FL setting with statistical heterogeneity. The results demonstrate the superiority of our pFedPrompt against the alternative approaches with robust performance.
KW - federated learning
KW - prompt learning
KW - user modeling and personalization
KW - vision-language models
UR - https://www.scopus.com/pages/publications/85159344495
U2 - 10.1145/3543507.3583518
DO - 10.1145/3543507.3583518
M3 - Conference article published in proceeding or book
AN - SCOPUS:85159344495
T3 - ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023
SP - 1364
EP - 1374
BT - ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023
PB - Association for Computing Machinery, Inc
T2 - 2023 World Wide Web Conference, WWW 2023
Y2 - 30 April 2023 through 4 May 2023
ER -