A Hierarchical Ensemble of ECOC for cancer classification based on multi-class microarray data

Kun Hong Liu, Zhi Hao Zeng, Vincent To Yee Ng

Research output: Journal article publicationJournal articleAcademic researchpeer-review

35 Citations (Scopus)

Abstract

The difficulty of the cancer classification using multi-class microarray datasets lies in that there are only a few samples in each class. To effectively solve such a problem, we propose a hierarchical ensemble strategy, named as Hierarchical Ensemble of Error Correcting Output Codes (HE-ECOC). In this strategy, different feature subsets extracted from a dataset are used as inputs for three data-dependent ECOC algorithms, so as to produce different ECOC coding matrices. The mutual diversity degrees among these coding matrices are then calculated based on two schemes, named as the maximizing local diversity (MLD) and the maximizing global diversity (MGD) schemes. Both schemes can choose diverse coding matrices generated by the same or different ECOC algorithm(s), and the average fusion scheme is used to fuse the outputs of base learners. In the experiments, it is found that both MLD and MGD based HE-ECOC strategies work stably, and outperform individual single ECOC algorithms. In contrast with some ensemble systems, HE-ECOC generates a more robust ensemble system, and achieves better performance in most case. In short, HE-ECOC is a promising solution for the multi-class problem. The matlab code is available upon request.
Original languageEnglish
Pages (from-to)102-118
Number of pages17
JournalInformation Sciences
Volume349-350
DOIs
Publication statusPublished - 1 Jul 2016

Keywords

  • Cancer classification
  • Ensemble learning
  • Error Correcting Output Codes (ECOC)
  • Feature selection
  • Multi-class microarray data

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this