Resolution-based outlier factor: Detecting the top-n most outlying data points in engineering data

Hongqin Fan, Osmar R. Zaïane, Andrew Foss, Junfeng Wu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

33 Citations (Scopus)

Abstract

One of the common endeavours in engineering applications is outlier detection, which aims to identify inconsistent records from large amounts of data. Although outlier detection schemes in data mining discipline are acknowledged as a more viable solution to efficient identification of anomalies from these data repository, current outlier mining algorithms require the input of domain parameters. These parameters are often unknown, difficult to determine and vary across different datasets containing different cluster features. This paper presents a novel resolution-based outlier notion and a nonparametric outlier-mining algorithm, which can efficiently identify and rank top listed outliers from a wide variety of datasets. The algorithm generates reasonable outlier results by taking both local and global features of a dataset into account. Experiments are conducted using both synthetic datasets and a real life construction equipment dataset from a large road building contractor. Comparison with the current outlier mining algorithms indicates that the proposed algorithm is more effective and can be integrated into a decision support system to serve as a universal detector of potentially inconsistent records.
Original languageEnglish
Pages (from-to)31-51
Number of pages21
JournalKnowledge and Information Systems
Volume19
Issue number1
DOIs
Publication statusPublished - 1 Jan 2009
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Resolution-based outlier factor: Detecting the top-n most outlying data points in engineering data'. Together they form a unique fingerprint.

Cite this