February 2026

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

digitado ⋅ 13 de February de 2026

arXiv:2602.11210v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key paradigm for training software engineering (SWE) agents, but existing pipelines typically rely on per-task containers for isolation. At scale, pre-built container images incur substantial storage overhead, slow environment setup, and require container-management privileges. We propose SWE-MiniSandbox, a lightweight, container-free method that enables scalable RL training of SWE agents without sacrificing isolation. Instead of relying on per-instance containers, SWE-MiniSandbox executes each task in an isolated workspace backed […]

Ver mais

Like 0

Liked Liked

technocracy

SAFuzz: Semantic-Guided Adaptive Fuzzing for LLM-Generated Code

digitado ⋅ 13 de February de 2026

arXiv:2602.11209v1 Announce Type: new Abstract: While AI-coding assistants accelerate software development, current testing frameworks struggle to keep pace with the resulting volume of AI-generated code. Traditional fuzzing techniques often allocate resources uniformly and lack semantic awareness of algorithmic vulnerability patterns, leading to inefficient resource usage and missed vulnerabilities. To address these limitations, we present a hybrid testing framework that leverages LLM-guided adaptive fuzzing to detect algorithmic vulnerabilities efficiently. Our system SAFuzz integrates prompt-based behavioral diversification, harness generation with […]

Ver mais

Like 0

Liked Liked

technocracy

Adaptive Physics Transformer with Fused Global-Local Attention for Subsurface Energy Systems

digitado ⋅ 13 de February de 2026

arXiv:2602.11208v1 Announce Type: new Abstract: The Earth’s subsurface is a cornerstone of modern society, providing essential energy resources like hydrocarbons, geothermal, and minerals while serving as the primary reservoir for $CO_2$ sequestration. However, full physics numerical simulations of these systems are notoriously computationally expensive due to geological heterogeneity, high resolution requirements, and the tight coupling of physical processes with distinct propagation time scales. Here we propose the textbf{Adaptive Physics Transformer} (APT), a geometry-, mesh-, and physics-agnostic neural operator […]

Ver mais

Like 0

Liked Liked

technocracy

UltraLIF: Fully Differentiable Spiking Neural Networks via Ultradiscretization and Max-Plus Algebra

digitado ⋅ 13 de February de 2026

arXiv:2602.11206v1 Announce Type: new Abstract: Spiking Neural Networks (SNNs) offer energy-efficient, biologically plausible computation but suffer from non-differentiable spike generation, necessitating reliance on heuristic surrogate gradients. This paper introduces UltraLIF, a principled framework that replaces surrogate gradients with ultradiscretization, a mathematical formalism from tropical geometry providing continuous relaxations of discrete dynamics. The central insight is that the max-plus semiring underlying ultradiscretization naturally models neural threshold dynamics: the log-sum-exp function serves as a differentiable soft-maximum that converges to hard […]

Ver mais

Like 0

Liked Liked

technocracy

Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders

digitado ⋅ 13 de February de 2026

arXiv:2602.11204v1 Announce Type: new Abstract: The widespread use of publicly available pre-trained encoders from self-supervised learning (SSL) has exposed a critical vulnerability: their susceptibility to downstream-agnostic adversarial examples (DAEs), which are crafted without knowledge of the downstream tasks but capable of misleading downstream models. While several defense methods have been explored recently, they rely primarily on task-specific adversarial fine-tuning, which inevitably limits generalizability and causes catastrophic forgetting and deteriorates benign performance. Different with previous works, we propose a […]

Ver mais

Like 0

Liked Liked

technocracy

Compositionality of Systems and Partially Ordered Runs

digitado ⋅ 13 de February de 2026

arXiv:2602.11203v1 Announce Type: new Abstract: In the late 1970s, C.A. Petri introduced partially ordered event occurrences (runs), then called emph{processes}, as the appropriate model to describe the individual evolutions of distributed systems. Here, we present a unified framework for handling Petri nets and their runs, specifically to compose and decompose them. It is shown that, for nets $M$ and $N$, the set of runs of the composed net $M bullet N$ equals the composition of the runs of […]

Ver mais

Like 0

Liked Liked

technocracy

interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors

digitado ⋅ 13 de February de 2026

arXiv:2602.11202v1 Announce Type: new Abstract: We present a test-time verification framework, interwhen, that ensures that the output of a reasoning model is valid wrt. a given set of verifiers. Verified reasoning is an important goal in high-stakes scenarios such as deploying agents in the physical world or in domains such as law and finance. However, current techniques either rely on the generate-test paradigm that verifies only after the final answer is produced, or verify partial output through a […]

Ver mais

Like 0

Liked Liked

technocracy

Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning

digitado ⋅ 13 de February de 2026

arXiv:2602.11201v1 Announce Type: new Abstract: Chain-of-Thought (CoT) explanations are widely used to interpret how language models solve complex problems, yet it remains unclear whether these step-by-step explanations reflect how the model actually reaches its answer, or merely post-hoc justifications. We propose Normalized Logit Difference Decay (NLDD), a metric that measures whether individual reasoning steps are faithful to the model’s decision-making process. Our approach corrupts individual reasoning steps from the explanation and measures how much the model’s confidence in […]

Ver mais

Like 0

Liked Liked

technocracy

AM-FM: A Foundation Model for Ambient Intelligence Through WiFi

digitado ⋅ 13 de February de 2026

arXiv:2602.11200v1 Announce Type: new Abstract: Ambient intelligence, continuously understanding human presence, activity, and physiology in physical spaces, is fundamental to smart environments, health monitoring, and human-computer interaction. WiFi infrastructure provides a ubiquitous, always-on, privacy-preserving substrate for this capability across billions of IoT devices. Yet this potential remains largely untapped, as wireless sensing has typically relied on task-specific models that require substantial labeled data and limit practical deployment. We present AM-FM, the first foundation model for ambient intelligence and […]

Ver mais

Like 0

Liked Liked

technocracy

When and What to Ask: AskBench and Rubric-Guided RLVR for LLM Clarification

digitado ⋅ 13 de February de 2026

arXiv:2602.11199v1 Announce Type: new Abstract: Large language models (LLMs) often respond even when prompts omit critical details or include misleading information, leading to hallucinations or reinforced misconceptions. We study how to evaluate and improve LLMs’ ability to decide when and what to ask for clarification without sacrificing task performance. We introduce AskBench, an interactive benchmark that converts standard QA pairs into multi-turn interactions with explicit checkpoints. A unified judge loop evaluates final answers and simulates user responses as […]

Ver mais

Like 0

Liked Liked