March 2026

Certainty robustness: Evaluating LLM stability under self-challenging prompts

digitado ⋅ 5 de March de 2026

arXiv:2603.03330v1 Announce Type: new Abstract: Large language models (LLMs) often present answers with high apparent confidence despite lacking an explicit mechanism for reasoning about certainty or truth. While existing benchmarks primarily evaluate single-turn accuracy, truthfulness or confidence calibration, they do not capture how models behave when their responses are challenged in interactive settings. We introduce the Certainty Robustness Benchmark, a two-turn evaluation framework that measures how LLMs balance stability and adaptability under self-challenging prompts such as uncertainty (“Are […]

Ver mais

Like 0

Liked Liked

technocracy

AutoHarness: improving LLM agents by automatically synthesizing a code harness

digitado ⋅ 5 de March de 2026

arXiv:2603.03329v1 Announce Type: new Abstract: Despite significant strides in language models in the last few years, when used as agents, such models often try to perform actions that are not just suboptimal for a given state, but are strictly prohibited by the external environment. For example, in the recent Kaggle GameArena chess competition, 78% of Gemini-2.5-Flash losses were attributed to illegal moves. Often people manually write “harnesses” around LLMs to prevent such failures. In this paper, we demonstrate […]

Ver mais

Like 0

Liked Liked

technocracy

StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

digitado ⋅ 5 de March de 2026

arXiv:2603.03328v1 Announce Type: new Abstract: Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest internal structures as well. While interpretability research has investigated the components of language models, existing approaches focus on local inter-token relationships within layers or modules (e.g., Multi-Head Attention), leaving global inter-layer relationships largely overlooked. To address this gap, we introduce StructLens, an analytical framework designed to reveal how internal structures […]

Ver mais

Like 0

Liked Liked

technocracy

A benchmark for joint dialogue satisfaction, emotion recognition, and emotion state transition prediction

digitado ⋅ 5 de March de 2026

arXiv:2603.03327v1 Announce Type: new Abstract: User satisfaction is closely related to enterprises, as it not only directly reflects users’ subjective evaluation of service quality or products, but also affects customer loyalty and long-term business revenue. Monitoring and understanding user emotions during interactions helps predict and improve satisfaction. However, relevant Chinese datasets are limited, and user emotions are dynamic; relying on single-turn dialogue cannot fully track emotional changes across multiple turns, which may affect satisfaction prediction. To address this, […]

Ver mais

Like 0

Liked Liked

technocracy

Controllable and explainable personality sliders for LLMs at inference time

digitado ⋅ 5 de March de 2026

arXiv:2603.03326v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with specific personas typically relies on expensive and monolithic Supervised Fine-Tuning (SFT) or RLHF. While effective, these methods require training distinct models for every target personality profile. Inference-time activation steering offers a parameter-efficient alternative, yet naive approaches fail to control multiple traits simultaneously due to destructive vector interference. In this work, we propose a modular framework for continuous, multi-dimensional personality control. Our key innovation is Sequential Adaptive Steering […]

Ver mais

Like 0

Liked Liked

technocracy

IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

digitado ⋅ 5 de March de 2026

arXiv:2603.03325v1 Announce Type: new Abstract: Large language models (LLMs) have become integral to modern Human-AI collaboration workflows, where accurately understanding user intent serves as a crucial step for generating satisfactory responses. Context-aware intent understanding, which involves inferring user intentions from situational environments, is inherently challenging because it requires reasoning over both the immediate context and the user’s underlying motivations that drive their behavior. Moreover, existing approaches often treat intent understanding as a static recognition task, overlooking users’ accumulated […]

Ver mais

Like 0

Liked Liked

technocracy

Controlling Chat Style in Language Models via Single-Direction Editing

digitado ⋅ 5 de March de 2026

arXiv:2603.03324v1 Announce Type: new Abstract: Controlling stylistic attributes in large language models (LLMs) remains challenging, with existing approaches relying on either prompt engineering or post-training alignment. This paper investigates this challenge through the lens of representation engineering, testing the hypothesis that distinct stylistic attributes – from emotional tone to linguistic structure – are encoded as linear directions in the model’s activation space. We provide strong empirical evidence for this hypothesis across a wide range of styles and, based […]

Ver mais

Like 0

Liked Liked

technocracy

Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

digitado ⋅ 5 de March de 2026

arXiv:2603.03323v1 Announce Type: new Abstract: Large language models (LLMs) aligned for safety often suffer from over-refusal, the tendency to reject seemingly toxic or benign prompts by misclassifying them as toxic. This behavior undermines models’ helpfulness and restricts usability in sensitive or nuanced contexts. While prior work has proposed mitigation strategies such as data augmentation and activation steering, these approaches often face a trade-off: reducing over-refusal typically degrades the model’s ability to reject genuinely harmful content. We argue that […]

Ver mais

Like 0

Liked Liked

technocracy

Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery

digitado ⋅ 5 de March de 2026

arXiv:2603.03322v1 Announce Type: new Abstract: Recent advancements in Large Language Model (LLM) agents have demonstrated remarkable potential in automatic knowledge discovery. However, rigorously evaluating an AI’s capacity for knowledge discovery remains a critical challenge. Existing benchmarks predominantly rely on static datasets, leading to inevitable data contamination where models have likely seen the evaluation knowledge during training. Furthermore, the rapid release cycles of modern LLMs render static benchmarks quickly outdated, failing to assess the ability to discover truly new […]

Ver mais

Like 0

Liked Liked

technocracy

DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

digitado ⋅ 5 de March de 2026

arXiv:2603.03321v1 Announce Type: new Abstract: Evaluating instruction following in Large Language Models requires decomposing instructions into verifiable requirements and assessing satisfaction–tasks currently dependent on manual annotation and uniform criteria that do not align with human judgment patterns. We present DIALEVAL, a type-theoretic framework using dual LLM agents to automate instruction decomposition into typed predicates and implement type-specific satisfaction semantics. The framework enforces formal atomicity and independence constraints during automated extraction, then applies differentiated evaluation criteria–semantic equivalence for content […]

Ver mais

Like 0

Liked Liked