January 2026

LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection

digitado ⋅ 30 de January de 2026

arXiv:2601.21082v1 Announce Type: new Abstract: The rapidly growing ecosystem of Large Language Models (LLMs) makes it increasingly challenging to manage and utilize the vast and dynamic pool of models effectively. We propose LOCUS, a method that produces low-dimensional vector embeddings that compactly represent a language model’s capabilities across queries. LOCUS is an attention-based approach that generates embeddings by a deterministic forward pass over query encodings and evaluation scores via an encoder model, enabling seamless incorporation of new models […]

Ver mais

Like 0

Liked Liked

technocracy

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

digitado ⋅ 30 de January de 2026

arXiv:2601.21081v1 Announce Type: new Abstract: Multimodal models for text-to-image generation have achieved strong visual fidelity, yet they remain brittle under compositional structural constraints-notably generative numeracy, attribute binding, and part-level relations. To address these challenges, we propose Shape-of-Thought (SoT), a visual CoT framework that enables progressive shape assembly via coherent 2D projections without external engines at inference time. SoT trains a unified multimodal autoregressive model to generate interleaved textual plans and rendered intermediate states, helping the model capture shape-assembly […]

Ver mais

Like 0

Liked Liked

technocracy

Parametric Hyperbolic Conservation Laws: A Unified Framework for Conservation, Entropy Stability, and Hyperbolicity

digitado ⋅ 30 de January de 2026

arXiv:2601.21080v1 Announce Type: new Abstract: We propose a parametric hyperbolic conservation law (SymCLaw) for learning hyperbolic systems directly from data while ensuring conservation, entropy stability, and hyperbolicity by design. Unlike existing approaches that typically enforce only conservation or rely on prior knowledge of the governing equations, our method parameterizes the flux functions in a form that guarantees real eigenvalues and complete eigenvectors of the flux Jacobian, thereby preserving hyperbolicity. At the same time, we embed entropy-stable design principles […]

Ver mais

Like 0

Liked Liked

technocracy

Towards Mitigating Modality Bias in Vision-Language Models for Temporal Action Localization

digitado ⋅ 30 de January de 2026

arXiv:2601.21078v1 Announce Type: new Abstract: Temporal Action Localization (TAL) requires identifying both the boundaries and categories of actions in untrimmed videos. While vision-language models (VLMs) offer rich semantics to complement visual evidence, existing approaches tend to overemphasize linguistic priors at the expense of visual performance, leading to a pronounced modality bias. We propose ActionVLM, a vision-language aggregation framework that systematically mitigates modality bias in TAL. Our key insight is to preserve vision as the dominant signal while adaptively […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-modal Imputation for Alzheimer’s Disease Classification

digitado ⋅ 30 de January de 2026

arXiv:2601.21076v1 Announce Type: new Abstract: Deep learning has been successful in predicting neurodegenerative disorders, such as Alzheimer’s disease, from magnetic resonance imaging (MRI). Combining multiple imaging modalities, such as T1-weighted (T1) and diffusion-weighted imaging (DWI) scans, can increase diagnostic performance. However, complete multimodal datasets are not always available. We use a conditional denoising diffusion probabilistic model to impute missing DWI scans from T1 scans. We perform extensive experiments to evaluate whether such imputation improves the accuracy of uni-modal […]

Ver mais

Like 0

Liked Liked

technocracy

Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering

digitado ⋅ 30 de January de 2026

arXiv:2601.21070v1 Announce Type: new Abstract: Large language models for code are advancing fast, yet our ability to evaluate them lags behind. Current benchmarks focus on narrow tasks and single metrics, which hide critical gaps in robustness, interpretability, fairness, efficiency, and real-world usability. They also suffer from inconsistent data engineering practices, limited software engineering context, and widespread contamination issues. To understand these problems and chart a path forward, we combined an in-depth survey of existing benchmarks with insights gathered […]

Ver mais

Like 0

Liked Liked

technocracy

Out-of-Distribution Generalization in Graph Foundation Models

digitado ⋅ 30 de January de 2026

arXiv:2601.21067v1 Announce Type: new Abstract: Graphs are a fundamental data structure for representing relational information in domains such as social networks, molecular systems, and knowledge graphs. However, graph learning models often suffer from limited generalization when applied beyond their training distributions. In practice, distribution shifts may arise from changes in graph structure, domain semantics, available modalities, or task formulations. To address these challenges, graph foundation models (GFMs) have recently emerged, aiming to learn general-purpose representations through large-scale pretraining […]

Ver mais

Like 0

Liked Liked

technocracy

BadDet+: Robust Backdoor Attacks for Object Detection

digitado ⋅ 30 de January de 2026

arXiv:2601.21066v1 Announce Type: new Abstract: Backdoor attacks pose a severe threat to deep learning, yet their impact on object detection remains poorly understood compared to image classification. While attacks have been proposed, we identify critical weaknesses in existing detection-based methods, specifically their reliance on unrealistic assumptions and a lack of physical validation. To bridge this gap, we introduce BadDet+, a penalty-based framework that unifies Region Misclassification Attacks (RMA) and Object Disappearance Attacks (ODA). The core mechanism utilizes a […]

Ver mais

Like 0

Liked Liked

technocracy

Textual Equilibrium Propagation for Deep Compound AI Systems

digitado ⋅ 30 de January de 2026

arXiv:2601.21064v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as part of compound AI systems that coordinate multiple modules (e.g., retrievers, tools, verifiers) over long-horizon workflows. Recent approaches that propagate textual feedback globally (e.g., TextGrad) make it feasible to optimize such pipelines, but we find that performance degrades as system depth grows. In particular, long-horizon agentic workflows exhibit two depth-scaling failure modes: 1) exploding textual gradient, where textual feedback grows exponentially with depth, leading to […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-Robot Decentralized Collaborative SLAM in Planetary Analogue Environments: Dataset, Challenges, and Lessons Learned

digitado ⋅ 30 de January de 2026

arXiv:2601.21063v1 Announce Type: new Abstract: Decentralized collaborative simultaneous localization and mapping (C-SLAM) is essential to enable multirobot missions in unknown environments without relying on preexisting localization and communication infrastructure. This technology is anticipated to play a key role in the exploration of the Moon, Mars, and other planets. In this article, we share insights and lessons learned from C-SLAM experiments involving three robots operating on a Mars analogue terrain and communicating over an ad hoc network. We examine […]

Ver mais

Like 0

Liked Liked