February 2026

A Geometric Taxonomy of Hallucinations in LLMs

digitado ⋅ 17 de February de 2026

arXiv:2602.13224v1 Announce Type: new Abstract: The term “hallucination” in large language models conflates distinct phenomena with different geometric signatures in embedding space. We propose a taxonomy identifying three types: unfaithfulness (failure to engage with provided context), confabulation (invention of semantically foreign content), and factual error (incorrect claims within correct conceptual frames). We observe a striking asymmetry. On standard benchmarks where hallucinations are LLM-generated, detection is domain-local: AUROC 0.76-0.99 within domains, but 0.50 (chance level) across domains. Discriminative directions […]

Ver mais

Like 0

Liked Liked

technocracy

Computability of Agentic Systems

digitado ⋅ 17 de February de 2026

arXiv:2602.13222v1 Announce Type: new Abstract: This paper introduces the Quest Graph, a formal framework for analyzing the capabilities of agentic systems with finite context. We define abstractions that model common reasoning techniques and establish their computational power: the base Quest Graph is equivalent to an unrestricted Turing machine; the forward-only Finite Quest Decision Process (FQDP), despite its wide use, is only equivalent to a pushdown automaton (context-free); and the Reference-Augmented QDP (RQDP) regains Turing completeness only when stateful […]

Ver mais

Like 0

Liked Liked

technocracy

Scaling the Scaling Logic: Agentic Meta-Synthesis of Logic Reasoning

digitado ⋅ 17 de February de 2026

arXiv:2602.13218v1 Announce Type: new Abstract: Scaling verifiable training signals remains a key bottleneck for Reinforcement Learning from Verifiable Rewards (RLVR). Logical reasoning is a natural substrate: constraints are formal and answers are programmatically checkable. However, prior synthesis pipelines either depend on expert-written code or operate within fixed templates/skeletons, which limits growth largely to instance-level perturbations. We propose SSLogic, an agentic meta-synthesis framework that scales at the task-family level by iteratively synthesizing and repairing executable Generator–Validator program pairs in […]

Ver mais

Like 0

Liked Liked

technocracy

VeRA: Verified Reasoning Data Augmentation at Scale

digitado ⋅ 17 de February de 2026

arXiv:2602.13217v1 Announce Type: new Abstract: The main issue with most evaluation schemes today is their “static” nature: the same problems are reused repeatedly, allowing for memorization, format exploitation, and eventual saturation. To measure genuine AI progress, we need evaluation that is robust by construction, not by post-hoc detection. In response, we propose VeRA (Verified Reasoning Data Augmentation), a framework that converts benchmark problems into executable specifications, comprising (i) a natural language template with placeholder slots, (ii) a coherent […]

Ver mais

Like 0

Liked Liked

technocracy

Network-Adaptive Cloud Preprocessing for Visual Neuroprostheses

digitado ⋅ 17 de February de 2026

arXiv:2602.13216v1 Announce Type: new Abstract: Cloud-based machine learning is increasingly explored as a preprocessing strategy for next-generation visual neuroprostheses, where advanced scene understanding may exceed the computational and energy constraints of battery-powered visual processing units (VPUs). Offloading computation to remote servers enables the use of state-of-the-art vision models, but also introduces sensitivity to network latency, jitter, and packet loss, which can disrupt the temporal consistency of the delivered neural stimulus. In this work, we examine the feasibility of […]

Ver mais

Like 0

Liked Liked

technocracy

When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching

digitado ⋅ 17 de February de 2026

arXiv:2602.13215v1 Announce Type: new Abstract: Transformers allocate uniform computation to every position, regardless of difficulty. State Space Models (SSMs) offer efficient alternatives but struggle with precise information retrieval over a long horizon. Inspired by dual-process theories of cognition (Kahneman, 2011), we propose AMOR (Adaptive Metacognitive Output Router), a hybrid architecture that dynamically engages sparse attention only when an SSM backbone is “uncertain”–as measured by prediction entropy. Compared to standard transformers, AMOR gains efficiency by projecting keys and values […]

Ver mais

Like 0

Liked Liked

technocracy

BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors

digitado ⋅ 17 de February de 2026

arXiv:2602.13214v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in interactive environments requiring strategic decision-making, yet systematic evaluation of these capabilities remains challenging. Existing benchmarks for LLMs primarily assess static reasoning through isolated tasks and fail to capture dynamic strategic abilities. Recent game-based evaluations employ LLM-vs-LLM tournaments that produce relative rankings dependent on transient model pools, incurring quadratic computational costs and lacking stable performance anchors for longitudinal tracking. The central challenge is establishing a scalable […]

Ver mais

Like 0

Liked Liked

technocracy

Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique

digitado ⋅ 17 de February de 2026

arXiv:2602.13213v1 Announce Type: new Abstract: Commercial insurance underwriting is a labor-intensive process that requires manual review of extensive documentation to assess risk and determine policy pricing. While AI offers substantial efficiency improvements, existing solutions lack comprehensive reasoning capabilities and internal mechanisms to ensure reliability within regulated, high-stakes environments. Full automation remains impractical and inadvisable in scenarios where human judgment and accountability are critical. This study presents a decision-negative, human-in-the-loop agentic system that incorporates an adversarial self-critique mechanism as […]

Ver mais

Like 0

Liked Liked

technocracy

UAVGENT: A Language-Guided Distributed Control Framework

digitado ⋅ 17 de February de 2026

arXiv:2602.13212v1 Announce Type: new Abstract: We study language-in-the-loop control for multi-drone systems that execute evolving, high-level missions while retaining formal robustness guarantees at the physical layer. We propose a three-layer architecture in which (i) a human operator issues natural-language instructions, (ii) an LLM-based supervisor periodically interprets, verifies, and corrects the commanded task in the context of the latest state and target estimates, and (iii) a distributed inner-loop controller tracks the resulting reference using only local relative information. We […]

Ver mais

Like 0

Liked Liked

technocracy

An Overlay Multicast Routing Method Based on Network Situational Aware-ness and Hierarchical Multi-Agent Reinforcement Learning

digitado ⋅ 17 de February de 2026

arXiv:2602.13211v1 Announce Type: new Abstract: Compared with IP multicast, Overlay Multicast (OM) offers better compatibility and flexible deployment in heterogeneous, cross-domain networks. However, traditional OM struggles to adapt to dynamic traffic due to unawareness of physical resource states, and existing reinforcement learning methods fail to decouple OM’s tightly coupled multi-objective nature, leading to high complexity, slow convergence, and instability. To address this, we propose MA-DHRL-OM, a multi-agent deep hierarchical reinforcement learning approach. Using SDN’s global view, it builds […]

Ver mais

Like 0

Liked Liked