TY - GEN
T1 - APT-Pipe: A Prompt-Tuning Tool for Social Data Annotation using ChatGPT
AU - Zhu, Yiming
AU - Yin, Zhizhuo
AU - Tyson, Gareth
AU - Haq, Ehsan Ul
AU - Lee, Lik Hang
AU - Hui, Pan
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/5/13
Y1 - 2024/5/13
N2 - Recent research has highlighted the potential of LLMs, like ChatGPT, for performing label annotation on social computing data. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning - - techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.
AB - Recent research has highlighted the potential of LLMs, like ChatGPT, for performing label annotation on social computing data. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning - - techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.
KW - human computation
KW - large language models
KW - prompt-tuning
UR - http://www.scopus.com/inward/record.url?scp=85194031552&partnerID=8YFLogxK
U2 - 10.1145/3589334.3645642
DO - 10.1145/3589334.3645642
M3 - Conference article published in proceeding or book
AN - SCOPUS:85194031552
T3 - WWW 2024 - Proceedings of the ACM Web Conference
SP - 245
EP - 255
BT - WWW 2024 - Proceedings of the ACM Web Conference
PB - Association for Computing Machinery, Inc
T2 - 33rd ACM Web Conference, WWW 2024
Y2 - 13 May 2024 through 17 May 2024
ER -