Adversarially Robust Long-Text Reasoning for Large Language Models with Self-Constructed Negative Samples

This study addresses the challenges faced by large language models in complex reasoning tasks, including semantic drift, logical breaks, and adversarial vulnerability, and proposes an adversarially robust generation paradigm based on self-constructed negative samples. The method builds a unified framework composed of latent representation modeling, internal perturbation generation, contrastive consistency constraints, and semantic stability control, enabling the model to identify potential biases and generate structured negative samples during reasoning, thereby forming a continuous internal correction mechanism. The model first maps the input text into a latent semantic space and constructs negative samples with controllable difficulty to reveal implicit conflicts and weak errors in the reasoning chain. It then strengthens the structural boundary between original and negative sample representations through the contrastive consistency module, which stabilizes semantic associations during reasoning, while the semantic stability constraint reduces perturbation-induced deviations and preserves the global structure of the semantic space. The experimental evaluation covers four core dimensions, including consistency, robustness, deviation level, and adversarial sensitivity, and includes sensitivity studies on data scale, perturbation strength, and negative sample difficulty to verify the robustness of the method. The results show that the proposed generation paradigm significantly improves internal consistency and adversarial stability in long text reasoning tasks across multiple scenarios, reduces semantic drift and reasoning errors, and provides a systematic design pathway for building highly reliable generative models.

Liked Liked