How many samples are needed? An investigation of binary logistic regression for selective omission in a road network

Qi Zhou, Zhilin Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)


Selective omission in a road network (or road selection) means to retain more important roads, and it is a necessary operator to transform a road network at a large scale to that at a smaller scale. This study discusses the use of the supervised learning approach to road selection, and investigates how many samples are needed for a good performance of road selection. More precisely, the binary logistic regression is employed and three road network data with different sizes and different target scales are involved for testing. The different percentages and numbers of strokes are randomly chosen for training a logistic regression model, which is further applied into the untrained strokes for validation. The performances of using the different sample sizes are mainly evaluated by an error rate estimate. Significance tests are also employed to investigate whether the use of different sample sizes shows statistically significant differences. The experimental results show that in most cases, the error rate estimate is around 0.1–0.2; more importantly, only a small number (e.g., 50–100) of training samples is needed, which indicates the usability of binary logistic regression for road selection.
Original languageEnglish
Pages (from-to)405-416
Number of pages12
JournalCartography and Geographic Information Science
Issue number5
Publication statusPublished - 19 Oct 2016


  • binary logistic regression
  • map generalization
  • Road network
  • selective omission

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Geography, Planning and Development
  • Management of Technology and Innovation

Cite this