An innovative approach of determining the sample data size for machine learning models: a case study on health and safety management for infrastructure workers

Haoqing Wang, Wen Yi, Yannick Liu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Numerical experiment is an essential part of academic studies in the field of transportation management. Using the appropriate sample size to conduct experiments can save both the data collecting cost and computing time. However, few studies have paid attention to determining the sample size. In this research, we use four typical regression models in machine learning and a dataset from transport infrastructure workers to explore the appropriate sample size. By observing 12 learning curves, we conclude that a sample size of 250 can balance model performance with the cost of data collection. Our study can provide a reference when deciding on the sample size to collect in advance.

Original languageEnglish
Pages (from-to)3452-3462
Number of pages11
JournalElectronic Research Archive
Volume30
Issue number9
DOIs
Publication statusPublished - Jul 2022

Keywords

  • Health and safety management
  • Learning curve
  • Machine learning
  • Sample size
  • Transportation infrastructure

ASJC Scopus subject areas

  • Mathematics(all)

Fingerprint

Dive into the research topics of 'An innovative approach of determining the sample data size for machine learning models: a case study on health and safety management for infrastructure workers'. Together they form a unique fingerprint.

Cite this