January 2026

Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed

digitado ⋅ 30 de January de 2026

arXiv:2601.21094v1 Announce Type: new Abstract: Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively […]

Ver mais

Like 0

Liked Liked

technocracy

MapPFN: Learning Causal Perturbation Maps in Context

digitado ⋅ 30 de January de 2026

arXiv:2601.21092v1 Announce Type: new Abstract: Planning effective interventions in biological systems requires treatment-effect models that adapt to unseen biological contexts by identifying their specific underlying mechanisms. Yet single-cell perturbation datasets span only a handful of biological contexts, and existing methods cannot leverage new interventional evidence at inference time to adapt beyond their training data. To meta-learn a perturbation effect estimator, we present MapPFN, a prior-data fitted network (PFN) pretrained on synthetic data generated from a prior over causal […]

Ver mais

Like 0

Liked Liked

technocracy

Deep Reinforcement Learning for Fault-Adaptive Routing in Eisenstein-Jacobi Interconnection Topologies

digitado ⋅ 30 de January de 2026

arXiv:2601.21090v1 Announce Type: new Abstract: The increasing density of many-core architectures necessitates interconnection networks that are both high-performance and fault-resilient. Eisenstein-Jacobi (EJ) networks, with their symmetric 6-regular topology, offer superior topological properties but challenge traditional routing heuristics under fault conditions. This paper evaluates three routing paradigms in faulty EJ environments: deterministic Greedy Adaptive Routing, theoretically optimal Dijkstra’s algorithm, and a reinforcement learning (RL)-based approach. Using a multi-objective reward function to penalize fault proximity and reward path efficiency, the […]

Ver mais

Like 0

Liked Liked

technocracy

Position-invariant Fine-tuning of Speech Enhancement Models with Self-supervised Speech Representations

digitado ⋅ 30 de January de 2026

arXiv:2601.21084v1 Announce Type: new Abstract: Integrating front-end speech enhancement (SE) models with self-supervised learning (SSL)-based speech models is effective for downstream tasks in noisy conditions. SE models are commonly fine-tuned using SSL representations with mean squared error (MSE) loss between enhanced and clean speech. However, MSE is prone to exploiting positional embeddings in SSL models, allowing the objective to be minimised through positional correlations instead of content-related information. This work frames the problem as a general limitation of […]

Ver mais

Like 0

Liked Liked

technocracy

OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence

digitado ⋅ 30 de January de 2026

arXiv:2601.21083v1 Announce Type: new Abstract: As large language models improve, so do their offensive applications: frontier agents now generate working exploits for under $50 in compute (Heelan, 2026). Defensive incident response (IR) agents must keep pace, but existing benchmarks conflate action execution with correct execution, hiding calibration failures when agents process adversarial evidence. We introduce OpenSec, a dual-control reinforcement learning environment that evaluates IR agents under realistic prompt injection scenarios. Unlike static capability benchmarks, OpenSec scores world-state-changing containment […]

Ver mais

Like 0

Liked Liked

technocracy

LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection

digitado ⋅ 30 de January de 2026

arXiv:2601.21082v1 Announce Type: new Abstract: The rapidly growing ecosystem of Large Language Models (LLMs) makes it increasingly challenging to manage and utilize the vast and dynamic pool of models effectively. We propose LOCUS, a method that produces low-dimensional vector embeddings that compactly represent a language model’s capabilities across queries. LOCUS is an attention-based approach that generates embeddings by a deterministic forward pass over query encodings and evaluation scores via an encoder model, enabling seamless incorporation of new models […]

Ver mais

Like 0

Liked Liked

technocracy

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

digitado ⋅ 30 de January de 2026

arXiv:2601.21081v1 Announce Type: new Abstract: Multimodal models for text-to-image generation have achieved strong visual fidelity, yet they remain brittle under compositional structural constraints-notably generative numeracy, attribute binding, and part-level relations. To address these challenges, we propose Shape-of-Thought (SoT), a visual CoT framework that enables progressive shape assembly via coherent 2D projections without external engines at inference time. SoT trains a unified multimodal autoregressive model to generate interleaved textual plans and rendered intermediate states, helping the model capture shape-assembly […]

Ver mais

Like 0

Liked Liked

technocracy

Parametric Hyperbolic Conservation Laws: A Unified Framework for Conservation, Entropy Stability, and Hyperbolicity

digitado ⋅ 30 de January de 2026

arXiv:2601.21080v1 Announce Type: new Abstract: We propose a parametric hyperbolic conservation law (SymCLaw) for learning hyperbolic systems directly from data while ensuring conservation, entropy stability, and hyperbolicity by design. Unlike existing approaches that typically enforce only conservation or rely on prior knowledge of the governing equations, our method parameterizes the flux functions in a form that guarantees real eigenvalues and complete eigenvectors of the flux Jacobian, thereby preserving hyperbolicity. At the same time, we embed entropy-stable design principles […]

Ver mais

Like 0

Liked Liked

technocracy

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization

digitado ⋅ 30 de January de 2026

arXiv:2601.21078v1 Announce Type: new Abstract: Temporal Action Localization (TAL) requires identifying both the boundaries and categories of actions in untrimmed videos. While vision-language models (VLMs) offer rich semantics to complement visual evidence, existing approaches tend to overemphasize linguistic priors at the expense of visual performance, leading to a pronounced modality bias. We propose ActionVLM, a vision-language aggregation framework that systematically mitigates modality bias in TAL. Our key insight is to preserve vision as the dominant signal while adaptively […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-modal Imputation for Alzheimer’s Disease Classification

digitado ⋅ 30 de January de 2026

arXiv:2601.21076v1 Announce Type: new Abstract: Deep learning has been successful in predicting neurodegenerative disorders, such as Alzheimer’s disease, from magnetic resonance imaging (MRI). Combining multiple imaging modalities, such as T1-weighted (T1) and diffusion-weighted imaging (DWI) scans, can increase diagnostic performance. However, complete multimodal datasets are not always available. We use a conditional denoising diffusion probabilistic model to impute missing DWI scans from T1 scans. We perform extensive experiments to evaluate whether such imputation improves the accuracy of uni-modal […]

Ver mais

Like 0

Liked Liked