k-Tree: Crossing sharp boundaries in regression trees to find neighbors

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Traditional classification and regression trees (CARTs) utilize a top-down, greedy approach to split the feature space into sharply defined, axis-aligned sub-regions (leaves). Each leaf treats all of the samples therein uniformly during the prediction process, leading to a constant predictor. Although this approach is well known for its interpretability and efficiency, it overlooks the complex local distributions within and across leaves. As the number of features increases, this limitation becomes more pronounced, often resulting in a concentration of samples near the boundaries of the leaves. Such clustering suggests that there is potential in identifying closer neighbors in adjacent leaves, a phenomenon that is unexplored in the literature. Our study addresses this gap by introducing the k-Tree methodology, a novel method that extends the search for nearest neighbors beyond a single leaf to include adjacent leaves. This approach has two key innovations: (1) establishing an adjacency relationship between leaves across the tree space and (2) designing novel intra-leaf and inter-leaf distance metrics through an optimization lens, which are tailored to local data distributions within the tree. We explore three implementations of the k-Tree methodology: (1) the Post-hoc k-Tree (Pk-Tree), which integrates the k-Tree methodology into constructed decision trees, (2) the Advanced k-Tree, which seamlessly incorporates the k-Tree methodology during the tree construction process, and (3) the Pk-random forest, which integrates the Pk-Tree principles with the random forest framework. The results of empirical evaluations conducted on a variety of real-world and synthetic datasets demonstrate that the k-Tree methods have greater prediction accuracy over the traditional models. These results highlight the potential of the k-Tree methodology in enhancing predictive analytics by providing a deeper insight into the relationships between samples within the tree space.

Original languageEnglish
Pages (from-to)567-579
Number of pages13
JournalEuropean Journal of Operational Research
Volume324
Issue number2
DOIs
Publication statusPublished - 16 Jul 2025

Keywords

  • Adaptive distance metric
  • Decision tree
  • k-nearest neighbors
  • Machine learning

ASJC Scopus subject areas

  • General Computer Science
  • Modelling and Simulation
  • Management Science and Operations Research
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'k-Tree: Crossing sharp boundaries in regression trees to find neighbors'. Together they form a unique fingerprint.

Cite this