CAHT: A Constraint-Aware Heterogeneous Transformer for Real-Time Multi-Robot Task Allocation in Warehouse Environments
The NP-hard coordination of heterogeneous robots for time-windowed warehouse tasks remains challenging: metaheuristics are precise but slow, whereas neural methods cannot handle heterogeneous constraints, leading to infeasible allocations. This paper presents the Constraint-Aware Heterogeneous Transformer (CAHT), a lightweight encoder–decoder architecture that performs end-to-end task assignment and sequencing in a single forward pass. The central innovation is a dynamic feasibility masking mechanism that enforces capacity and energy constraints directly within the softmax computation, eliminating infeasible allocations at the architectural level. This is complemented by a spatial-bias Transformer encoder and a two-stage supervised–reinforcement learning training paradigm using ALNS-generated labels. Experiments across four problem scales (5–20 robots, 50–200 tasks) demonstrate that CAHT achieves objective values within 7–13% of the ALNS reference while being 29–91× faster (23–104 ms vs. 2–3 s). Constraint violation rates remain below 6% with time-window satisfaction above 94%. Ablation analysis identifies dynamic masking as the dominant contribution (+213% degradation upon removal), and cross-scale generalization reveals that the optimality gap decreases from 13.0% to 10.7% as problem scale grows. With only 0.82M parameters, CAHT occupies a previously vacant region on the speed–quality Pareto frontier, offering a practical path toward real-time autonomous warehouse coordination.