SHIT: A Negative Adaptive Attention Model in Few Shot Learning Capability Named APA
We present Adaptive Prototype Attention (APA), a task-aware, prototype-guided, and multi-scale attention mechanism tailored for few-shot learning with Transformer-style archi tectures. APA (i) modulates attention weights with task context, (ii) injects prototype-conditioned signals to enhance within-class cohesion and between-class separation, and (iii) aggregates local and global dependencies across multiple scales. In controlled few shot classification experiments (5-way, 5-shot, synthetic episodes), APA consistently underperforms strong baselines. Compared with standard attention, APA decreases accuracy from 0.425 to 0.208 and macro-F1 from 0.419 to 0.084; relative to prototype only and multiscale-only variants, APA achieves accuracy drops of 0.205 and 0.232, respectively. APA converges within ∼921.7 epochs with a final loss ≈ 0.0000, indicating slow optimization; attention visualizations exhibit non-compact, task-agnostic pat terns (all experimental results are from the user-provided run logs). These findings suggest that the coupling of task-aware modulation with prototype guidance and multi-scale aggregation in the current APA design is ineffective for data-scarce regimes, and provide a practical warning for attention mechanism design in few-shot learning.