Generalizing over Long Tail Concepts for Medical Term Normalization

Beatrice Portelli, Simone Scaboro, Enrico Santus, Hooman Sedghamiz, Emmanuele Chersoni, Giuseppe Serra

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review


Medical term normalization consists in mapping a piece of text to a large number of output classes.Given the small size of the annotated datasets and the extremely long tail distribution of the concepts, it is of utmost importance to develop models that are capable to generalize to scarce or unseen concepts.An important attribute of most target ontologies is their hierarchical structure. In this paper we introduce a simple and effective learning strategy that leverages such information to enhance the generalizability of both discriminative and generative models.The evaluation shows that the proposed strategy produces state-of-the-art performance on seen concepts and consistent improvements on unseen ones, allowing also for efficient zero-shot knowledge transfer across text typologies and datasets.
Original languageEnglish
Title of host publicationProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
EditorsYoav Goldberg, Zornitsa Kozareva, Yue Zhang
PublisherAssociation for Computational Linguistics (ACL)
Publication statusPublished - Dec 2022
EventConference on Empirical Methods in Natural Language Processing - Abu Dhabi National Exhibition Centre, Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022


ConferenceConference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Internet address


Dive into the research topics of 'Generalizing over Long Tail Concepts for Medical Term Normalization'. Together they form a unique fingerprint.

Cite this