Abstract
In this work, we study generalization in reinforcement learning for traveling salesman problem (TSP). While efforts have been made for designing deep reinforcement learning-based solvers to achieve near optimal results in small tasks, it is still an open problem to apply such solvers to larger-scale tasks by retaining performance. In this research, we learn the shared dynamics in TSP environments based on multi-task learning, which can be generalized to new tasks. To accurately estimate such dynamics, we consider leveraging the node visitation information. Besides designing RL-based models to attentively aggregate the visitation information during decision making, we propose a scheduled data utilization strategy to stabilize learning with various problem sizes. The experimental result shows that our model achieves improved generalizability for unseen larger TSPs in both zero-shot and few-shot settings.
Original language | English |
---|---|
Article number | 102005 |
Journal | Advanced Engineering Informatics |
Volume | 56 |
DOIs | |
Publication status | Published - Apr 2023 |
Keywords
- Combinatorial optimization
- Few-shot learning
- Multi-task learning
- Reinforcement learning
- Traveling salesman problem
- Zero-shot learning
ASJC Scopus subject areas
- Information Systems
- Artificial Intelligence