Abstract
Support vector machine (SVM) has been becoming a provably effective tool for non-coding RNA (ncRNA) data classification. However, as the species and sizes of ncRNA sequences quickly increase, its training time becomes intolerable and even impractical for large scale data. Although many fast SVM-based classification techniques have been developed, their applicability heavily depends on the involved formulations and particularly the computational reduction of the corresponding kernel matrix. In this paper, based on the latest advance in fast two-dimensional convex hull approximation with asymptotic linear time complexity, a fast convex-hull vector machine (CHVM) is developed to achieve a breakthrough of the applicability limitation of SVM-based classification techniques and provide more choices for large-scale ncRNA data classification tasks. By projecting a dataset onto all the corresponding two-dimensional projection combinations, CHVM first extracts the boundary vectors quickly for the whole training dataset in the kernel space, and then attempts to form the convex hull vectors for the whole kernelized training set by integrating all the obtained boundary vectors. Finally, the convex hull vectors are presented as the inputs to a SVM classifier, regardless of the adopted SVM's formulation. The experimental results on three large-scale ncRNA datasets indicate that CHVM outperforms the five SVM based classifiers, random forest (RF) and back propagation neural networks (BP), especially in training time.
Original language | English |
---|---|
Pages (from-to) | 149-164 |
Number of pages | 16 |
Journal | Knowledge-Based Systems |
Volume | 151 |
DOIs | |
Publication status | Published - 1 Jul 2018 |
Keywords
- Fast convex hull approximation
- Kernelization
- Large scale ncRNA data classification
- Support vector machines
ASJC Scopus subject areas
- Software
- Management Information Systems
- Information Systems and Management
- Artificial Intelligence