Natural Language Processing for Digital Humanities: How to Analyze the Related Materials of Traditional Chinese Drama in the Early 20th Century (1900-1937) from the Perspective of Digital Humanities—Focusing on Newspaper Databases, Record Databases, and Script Collections

Chao-Lin Liu, Jen-Joe Hung, Wan-yi Wu, Su-bing Chang

    Research output: Unpublished conference presentation (presented paper, abstract, poster)Conference presentation (not published in journal/proceeding/book)Academic researchpeer-review

    Abstract

    The research of digital humanities has flourished in the past decade internationally. In contrast,
    the participation of researchers of computational linguistics in domestic research projects
    remains less common than one may have anticipated. The goal of this panel is to introduce
    sample research projects of digital humanities to the community of computational linguistics,
    hoping to promote further cooperation between the two communities. The panel consists of
    four parts. Hung introduces the online repository of the Taishō Tripiṭaka that the Chinese
    Buddhist Electronic Text Association (often called CBETA) offers. Applications of language
    technology, including artificial intelligence and natural language processing, for the
    construction of CBETA will be discussed. Chang and her colleagues aim to build the Taiwan
    Biographical Database (TBDB), which, in the long term, will serve as part of the bedrock for
    historical studies about Taiwan. Experience about how the research team extracted and
    integrated the information from some collections of local gazetteers to build the TBDB will be
    discusses. Dramas are important part of Chinese arts. Relevant materials about dramas are
    available in some different databases and in different forms. Wu will share with us her study
    on Chinese dramas, and elaborates on the potential contributions of language technology to the
    studies of Chinese dramas. If time allows, Liu plans to outline his work on optical character
    recognition (OCR) for ancient Chinese documents, sentence segmentation for classical Chinese,
    word segmentation for classical Chinese poems, including the Tang and Song poems, and
    information extraction from historical documents in classical Chinese.
    The 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)
    Taipei, Taiwan, September 24–26, 2020. The Association for Computational Linguistics and Chinese Language Processing
    414
    Original languageEnglish
    Pages413-422
    Number of pages500
    Publication statusPublished - 24 Sept 2020
    EventThe 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)
    - National Taipei University of Technology, Taipei, Taiwan
    Duration: 24 Sept 202026 Sept 2023
    https://sites.google.com/ntut.org.tw/rocling2020

    Conference

    ConferenceThe 32nd Conference on Computational Linguistics and Speech Processing (ROCLING 2020)
    Country/TerritoryTaiwan
    CityTaipei
    Period24/09/2026/09/23
    Internet address

    Fingerprint

    Dive into the research topics of 'Natural Language Processing for Digital Humanities: How to Analyze the Related Materials of Traditional Chinese Drama in the Early 20th Century (1900-1937) from the Perspective of Digital Humanities—Focusing on Newspaper Databases, Record Databases, and Script Collections'. Together they form a unique fingerprint.

    Cite this