Domain-Specific Temporal Dynamics of Context Sensitivity in Large Language Models

Large language models exhibit context sensitivity—the degree to which conversational history shapes their responses. However, whether this sensitivity varies systematically across conversation positions and task domains remains unexplored. We analyzed 12 models across 30 conversational positions in two domains: philosophical reasoning (open-goal, 4 models) and medical summarization (closed-goal, 8 models), totaling approximately 54,000 responses. We report three findings. First, position 30 (P30) task enablement: at the summarization position, all 8 medical models showed extreme context sensitivity spikes (Z > +2.7, mean Z = +3.47, SD = 0.51), while all 4 philosophy models showed no spike (mean Z = +0.25). This 8/8 vs 0/4 split (p = 0.002, Fisher’s exact test) suggests that closed-goal tasks require accumulated context for task execution, not merely performance enhancement. Second, domain-specific temporal patterns: medical models exhibited U-shaped dynamics with a diagnostic trough in mid-conversation (positions 10-25), while philosophy models showed inverted-U patterns peaking mid-conversation. Third, disruption sensitivity: all 12 models showed negative disruption sensitivity (DS < 0; medical mean −0.115, philosophy mean −0.073), indicating that context presence provides more value than context ordering—an architecture-independent finding. These results demonstrate that task structure, not domain content, determines how context sensitivity unfolds over conversation. We introduce the distinction between open-goal (Type 1) and closed-goal (Type 2) temporal dynamics, with implications for clinical AI deployment where summarization reliability is critical.

Liked Liked