What Makes a Transformer Solve the TSP? A Component-Wise Analysis

The Traveling Salesman Problem (TSP) remains a central benchmark in combinatorial optimization, with applications in logistics, manufacturing, and network design. While exact solvers and classical heuristics offer strong performance, they rely on handcrafted design and show limited adaptability. Recent advances in deep learning have introduced a new paradigm: learning heuristics directly from data, with Transformers standing out for capturing global dependencies and scaling effectively via parallelism. This survey offers a component-wise analysis of Transformer-based TSP models, serving as both a structured review and a tutorial for new researchers. We classify solution paradigms—including constructive autoregressive and non-autoregressive models, local-search refinement, and hyperheuristics—and examine state representations, architectural variants (pointer networks, efficient attention, hierarchical or dual-aspect designs), and resolution strategies such as decoding heuristics and integrations with classical refiners. We also highlight hybrid models combining Transformers with CNNs, GNNs, or hierarchical decomposition, alongside training methods spanning supervised imitation and reinforcement learning. By organizing the literature around these building blocks, we clarify where Transformers excel, where classical heuristics remain essential, and how hybridization can bridge the gap. Our goal is to provide a critical roadmap and tutorial-style reference connecting classical optimization with modern Transformer-based methods.

Liked Liked