TY - GEN
T1 - Embedding-based Product Retrieval in Taobao Search
AU - Li, Sen
AU - Lv, Fuyu
AU - Jin, Taiwei
AU - Lin, Guli
AU - Yang, Keping
AU - Zeng, Xiaoyi
AU - Wu, Xiao Ming
AU - Ma, Qianli
N1 - Funding Information:
We would like to thank the anonymous reviewers for their helpful feedbacks. This research was supported by the Alibaba Innovative Research project 0034058 (ZGAL).
Publisher Copyright:
© 2021 ACM.
PY - 2021/8/14
Y1 - 2021/8/14
N2 - Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in people's life. The retrieval phase of products determines the search system's quality and gradually attracts researchers' attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this domain have mainly focused on embedding-based retrieval (EBR) systems. However, after a long period of practice on Taobao, we find that the performance of the EBR system is dramatically degraded due to its: (1) low relevance with a given query and (2) discrepancy between the training and inference phases. Therefore, we propose a novel and practical embedding-based product retrieval model, named Multi-Grained Deep Semantic Product Retrieval (MGDSPR). Specifically, we first identify the inconsistency between the training and inference stages, and then use the softmax cross-entropy loss as the training objective, which achieves better performance and faster convergence. Two efficient methods are further proposed to improve retrieval relevance, including smoothing noisy training data and generating relevance-improving hard negative samples without requiring extra knowledge and training procedures. We evaluate MGDSPR on Taobao Product Search with significant metrics gains observed in offline experiments and online A/B tests. MGDSPR has been successfully deployed to the existing multi-channel retrieval system in Taobao Search. We also introduce the online deployment scheme and share practical lessons of our retrieval system to contribute to the community.
AB - Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in people's life. The retrieval phase of products determines the search system's quality and gradually attracts researchers' attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this domain have mainly focused on embedding-based retrieval (EBR) systems. However, after a long period of practice on Taobao, we find that the performance of the EBR system is dramatically degraded due to its: (1) low relevance with a given query and (2) discrepancy between the training and inference phases. Therefore, we propose a novel and practical embedding-based product retrieval model, named Multi-Grained Deep Semantic Product Retrieval (MGDSPR). Specifically, we first identify the inconsistency between the training and inference stages, and then use the softmax cross-entropy loss as the training objective, which achieves better performance and faster convergence. Two efficient methods are further proposed to improve retrieval relevance, including smoothing noisy training data and generating relevance-improving hard negative samples without requiring extra knowledge and training procedures. We evaluate MGDSPR on Taobao Product Search with significant metrics gains observed in offline experiments and online A/B tests. MGDSPR has been successfully deployed to the existing multi-channel retrieval system in Taobao Search. We also introduce the online deployment scheme and share practical lessons of our retrieval system to contribute to the community.
KW - e-commerce search
KW - embedding-based retrieval system
UR - http://www.scopus.com/inward/record.url?scp=85114943302&partnerID=8YFLogxK
U2 - 10.1145/3447548.3467101
DO - 10.1145/3447548.3467101
M3 - Conference article published in proceeding or book
AN - SCOPUS:85114943302
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 3181
EP - 3189
BT - KDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021
Y2 - 14 August 2021 through 18 August 2021
ER -