February 2026

The Vulnerability of LLM Rankers to Prompt Injection Attacks

digitado ⋅ 20 de February de 2026

arXiv:2602.16752v1 Announce Type: new Abstract: Large Language Models (LLMs) have emerged as powerful re-rankers. Recent research has however showed that simple prompt injections embedded within a candidate document (i.e., jailbreak prompt attacks) can significantly alter an LLM’s ranking decisions. While this poses serious security risks to LLM-based ranking pipelines, the extent to which this vulnerability persists across diverse LLM families, architectures, and settings remains largely under-explored. In this paper, we present a comprehensive empirical study of jailbreak prompt […]

Ver mais

Like 0

Liked Liked

technocracy

A Construction-Phase Digital Twin Framework for Quality Assurance and Decision Support in Civil Infrastructure Projects

digitado ⋅ 20 de February de 2026

arXiv:2602.16748v1 Announce Type: new Abstract: Quality assurance (QA) during construction often relies on inspection records and laboratory test results that become available days or weeks after work is completed. On large highway and bridge projects, this delay limits early intervention and increases the risk of rework, schedule impacts, and fragmented documentation. This study presents a construction-phase digital twin framework designed to support element-level QA and readiness-based decision making during active construction. The framework links inspection records, material production […]

Ver mais

Like 0

Liked Liked

technocracy

LiveClin: A Live Clinical Benchmark without Leakage

digitado ⋅ 20 de February de 2026

arXiv:2602.16747v1 Announce Type: new Abstract: The reliability of medical LLM evaluation is critically undermined by data contamination and knowledge obsolescence, leading to inflated scores on static benchmarks. To address these challenges, we introduce LiveClin, a live benchmark designed for approximating real-world clinical practice. Built from contemporary, peer-reviewed case reports and updated biannually, LiveClin ensures clinical currency and resists data contamination. Using a verified AI-human workflow involving 239 physicians, we transform authentic patient cases into complex, multimodal evaluation scenarios […]

Ver mais

Like 0

Liked Liked

technocracy

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

digitado ⋅ 20 de February de 2026

arXiv:2602.16746v1 Announce Type: new Abstract: Grokking — the delayed transition from memorization to generalization in small algorithmic tasks — remains poorly understood. We present a geometric analysis of optimization dynamics in transformers trained on modular arithmetic. PCA of attention weight trajectories reveals that training evolves predominantly within a low-dimensional execution subspace, with a single principal component capturing 68-83% of trajectory variance. To probe loss-landscape geometry, we measure commutator defects — the non-commutativity of successive gradient steps — and […]

Ver mais

Like 0

Liked Liked

technocracy

PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

digitado ⋅ 20 de February de 2026

arXiv:2602.16745v1 Announce Type: new Abstract: Test-time scaling can improve model performance by aggregating stochastic reasoning trajectories. However, achieving sample-efficient test-time self-consistency under a limited budget remains an open challenge. We introduce PETS (Principled and Efficient Test-TimeSelf-Consistency), which initiates a principled study of trajectory allocation through an optimization framework. Central to our approach is the self-consistency rate, a new measure defined as agreement with the infinite-budget majority vote. This formulation makes sample-efficient test-time allocation theoretically grounded and amenable to […]

Ver mais

Like 0

Liked Liked

technocracy

ICP-Based Pallet Tracking for Unloading on Inclined Surfaces by Autonomous Forklifts

digitado ⋅ 20 de February de 2026

arXiv:2602.16744v1 Announce Type: new Abstract: This paper proposes a control method for autonomous forklifts to unload pallets on inclined surfaces, enabling the fork to be withdrawn without dragging the pallets. The proposed method applies the Iterative Closest Point (ICP) algorithm to point clouds measured from the upper region of the pallet and thereby tracks the relative position and attitude angle difference between the pallet and the fork during the unloading operation in real-time. According to the tracking result, […]

Ver mais

Like 0

Liked Liked

technocracy

DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning

digitado ⋅ 20 de February de 2026

arXiv:2602.16742v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has been shown effective in enhancing the visual reflection and reasoning capabilities of Large Multimodal Models (LMMs). However, existing datasets are predominantly derived from either small-scale manual construction or recombination of prior resources, which limits data diversity and coverage, thereby constraining further gains in model performance. To this end, we introduce textbf{DeepVision-103K}, a comprehensive dataset for RLVR training that covers diverse K12 mathematical topics, extensive knowledge points, […]

Ver mais

Like 0

Liked Liked

technocracy

Can Adversarial Code Comments Fool AI Security Reviewers — Large-Scale Empirical Study of Comment-Based Attacks and Defenses Against LLM Code Analysis

digitado ⋅ 20 de February de 2026

arXiv:2602.16741v1 Announce Type: new Abstract: AI-assisted code review is widely used to detect vulnerabilities before production release. Prior work shows that adversarial prompt manipulation can degrade large language model (LLM) performance in code generation. We test whether similar comment-based manipulation misleads LLMs during vulnerability detection. We build a 100-sample benchmark across Python, JavaScript, and Java, each paired with eight comment variants ranging from no comments to adversarial strategies such as authority spoofing and technical deception. Eight frontier models, […]

Ver mais

Like 0

Liked Liked

technocracy

Quantifying LLM Attention-Head Stability: Implications for Circuit Universality

digitado ⋅ 20 de February de 2026

arXiv:2602.16740v1 Announce Type: new Abstract: In mechanistic interpretability, recent work scrutinizes transformer “circuits” – sparse, mono or multi layer sub computations, that may reflect human understandable functions. Yet, these network circuits are rarely acid-tested for their stability across different instances of the same deep learning architecture. Without this, it remains unclear whether reported circuits emerge universally across labs or turn out to be idiosyncratic to a particular estimation instance, potentially limiting confidence in safety-critical settings. Here, we systematically […]

Ver mais

Like 0

Liked Liked

technocracy

Real-time Secondary Crash Likelihood Prediction Excluding Post Primary Crash Features

digitado ⋅ 20 de February de 2026

arXiv:2602.16739v1 Announce Type: new Abstract: Secondary crash likelihood prediction is a critical component of an active traffic management system to mitigate congestion and adverse impacts caused by secondary crashes. However, existing approaches mainly rely on post-crash features (e.g., crash type and severity) that are rarely available in real time, limiting their practical applicability. To address this limitation, we propose a hybrid secondary crash likelihood prediction framework that does not depend on post-crash features. A dynamic spatiotemporal window is […]

Ver mais

Like 0

Liked Liked