Abstract
Traditional classification and regression trees (CARTs) utilize a top-down, greedy approach to split the feature space into sharply defined, axis-aligned sub-regions (leaves). Each leaf treats all of the samples therein uniformly during the prediction process, leading to a constant predictor. Although this approach is well known for its interpretability and efficiency, it overlooks the complex local distributions within and across leaves. As the number of features increases, this limitation becomes more pronounced, often resulting in a concentration of samples near the boundaries of the leaves. Such clustering suggests that there is potential in identifying closer neighbors in adjacent leaves, a phenomenon that is unexplored in the literature. Our study addresses this gap by introducing the k-Tree methodology, a novel method that extends the search for nearest neighbors beyond a single leaf to include adjacent leaves. This approach has two key innovations: (1) establishing an adjacency relationship between leaves across the tree space and (2) designing novel intra-leaf and inter-leaf distance metrics through an optimization lens, which are tailored to local data distributions within the tree. We explore three implementations of the k-Tree methodology: (1) the Post-hoc k-Tree (Pk-Tree), which integrates the k-Tree methodology into constructed decision trees, (2) the Advanced k-Tree, which seamlessly incorporates the k-Tree methodology during the tree construction process, and (3) the Pk-random forest, which integrates the Pk-Tree principles with the random forest framework. The results of empirical evaluations conducted on a variety of real-world and synthetic datasets demonstrate that the k-Tree methods have greater prediction accuracy over the traditional models. These results highlight the potential of the k-Tree methodology in enhancing predictive analytics by providing a deeper insight into the relationships between samples within the tree space.
| Original language | English |
|---|---|
| Pages (from-to) | 567-579 |
| Number of pages | 13 |
| Journal | European Journal of Operational Research |
| Volume | 324 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 16 Jul 2025 |
Keywords
- Adaptive distance metric
- Decision tree
- k-nearest neighbors
- Machine learning
ASJC Scopus subject areas
- General Computer Science
- Modelling and Simulation
- Management Science and Operations Research
- Information Systems and Management