January 2026

Flatter Tokens are More Valuable for Speculative Draft Model Training

digitado ⋅ 28 de January de 2026

arXiv:2601.18902v1 Announce Type: new Abstract: Speculative Decoding (SD) is a key technique for accelerating Large Language Model (LLM) inference, but it typically requires training a draft model on a large dataset. We approach this problem from a data-centric perspective, finding that not all training samples contribute equally to the SD acceptance rate. Specifically, our theoretical analysis and empirical validation reveals that tokens inducing flatter predictive distributions from the target model are more valuable than those yielding sharply peaked […]

Ver mais

Like 0

Liked Liked

technocracy

Self-Aware Knowledge Probing: Evaluating Language Models’ Relational Knowledge through Confidence Calibration

digitado ⋅ 28 de January de 2026

arXiv:2601.18901v1 Announce Type: new Abstract: Knowledge probing quantifies how much relational knowledge a language model (LM) has acquired during pre-training. Existing knowledge probes evaluate model capabilities through metrics like prediction accuracy and precision. Such evaluations fail to account for the model’s reliability, reflected in the calibration of its confidence scores. In this paper, we propose a novel calibration probing framework for relational knowledge, covering three modalities of model confidence: (1) intrinsic confidence, (2) structural consistency and (3) semantic […]

Ver mais

Like 0

Liked Liked

technocracy

Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries

digitado ⋅ 28 de January de 2026

arXiv:2601.18899v1 Announce Type: new Abstract: Large Language Model (LLM)-powered Automatic Speech Recognition (ASR) systems achieve strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. Prior work trains a separate connector per language, overlooking linguistic relatedness. We propose an efficient and novel connector-sharing strategy based on linguistic family membership, enabling one connector per family, and empirically validate its effectiveness across two multilingual LLMs and two real-world corpora spanning curated […]

Ver mais

Like 0

Liked Liked

technocracy

Explainable Uncertainty Quantification for Wastewater Treatment Energy Prediction via Interval Type-2 Neuro-Fuzzy System

digitado ⋅ 28 de January de 2026

arXiv:2601.18897v1 Announce Type: new Abstract: Wastewater treatment plants consume 1-3% of global electricity, making accurate energy forecasting critical for operational optimization and sustainability. While machine learning models provide point predictions, they lack explainable uncertainty quantification essential for risk-aware decision-making in safety-critical infrastructure. This study develops an Interval Type-2 Adaptive Neuro-Fuzzy Inference System (IT2-ANFIS) that generates interpretable prediction intervals through fuzzy rule structures. Unlike black-box probabilistic methods, the proposed framework decomposes uncertainty across three levels: feature-level, footprint of uncertainty […]

Ver mais

Like 0

Liked Liked

technocracy

Weakly supervised framework for wildlife detection and counting in challenging Arctic environments: a case study on caribou (Rangifer tarandus)

digitado ⋅ 28 de January de 2026

arXiv:2601.18891v1 Announce Type: new Abstract: Caribou across the Arctic has declined in recent decades, motivating scalable and accurate monitoring approaches to guide evidence-based conservation actions and policy decisions. Manual interpretation from this imagery is labor-intensive and error-prone, underscoring the need for automatic and reliable detection across varying scenes. Yet, such automatic detection is challenging due to severe background heterogeneity, dominant empty terrain (class imbalance), small or occluded targets, and wide variation in density and scale. To make the […]

Ver mais

Like 0

Liked Liked

technocracy

XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation

digitado ⋅ 28 de January de 2026

arXiv:2601.18886v1 Announce Type: new Abstract: This paper introduces XProvence, a multilingual zero-cost context pruning model for retrieval-augmented generation (RAG), trained on 16 languages and supporting 100+ languages through effective cross-lingual transfer. Motivated by the growing use of RAG systems across diverse languages, we explore several strategies to generalize the Provence framework-which first integrated efficient zero-cost context pruning directly into the re-ranking model-beyond English. Across four multilingual question answering benchmarks, we show how XProvence can prune RAG contexts with […]

Ver mais

Like 0

Liked Liked

technocracy

Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer Language Model

digitado ⋅ 28 de January de 2026

arXiv:2601.18858v1 Announce Type: new Abstract: Compositional generalization-the ability to interpret novel combinations of familiar components-remains a persistent challenge for neural networks. Behavioral evaluations reveal when models fail but offer limited insight into why failures arise at the representational level. We introduce Homomorphism Error (HE), a structural metric that quantifies deviations from approximate homomorphisms between the expression algebra and a model’s hidden-state space. We instantiate HE for two compositional operators in SCAN-style tasks: modifier HE for unary composition and […]

Ver mais

Like 0

Liked Liked

technocracy

SelfieAvatar: Real-time Head Avatar reenactment from a Selfie Video

digitado ⋅ 28 de January de 2026

arXiv:2601.18851v1 Announce Type: new Abstract: Head avatar reenactment focuses on creating animatable personal avatars from monocular videos, serving as a foundational element for applications like social signal understanding, gaming, human-machine interaction, and computer vision. Recent advances in 3D Morphable Model (3DMM)-based facial reconstruction methods have achieved remarkable high-fidelity face estimation. However, on the one hand, they struggle to capture the entire head, including non-facial regions and background details in real time, which is an essential aspect for producing […]

Ver mais

Like 0

Liked Liked

technocracy

Towards Safety-Compliant Transformer Architectures for Automotive Systems

digitado ⋅ 28 de January de 2026

arXiv:2601.18850v1 Announce Type: new Abstract: Transformer-based architectures have shown remarkable performance in vision and language tasks but pose unique challenges for safety-critical applications. This paper presents a conceptual framework for integrating Transformers into automotive systems from a safety perspective. We outline how multimodal Foundation Models can leverage sensor diversity and redundancy to improve fault tolerance and robustness. Our proposed architecture combines multiple independent modality-specific encoders that fuse their representations into a shared latent space, supporting fail-operational behavior if […]

Ver mais

Like 0

Liked Liked

technocracy

Audio-Driven Talking Face Generation with Blink Embedding and Hash Grid Landmarks Encoding

digitado ⋅ 28 de January de 2026

arXiv:2601.18849v1 Announce Type: new Abstract: Dynamic Neural Radiance Fields (NeRF) have demonstrated considerable success in generating high-fidelity 3D models of talking portraits. Despite significant advancements in the rendering speed and generation quality, challenges persist in accurately and efficiently capturing mouth movements in talking portraits. To tackle this challenge, we propose an automatic method based on blink embedding and hash grid landmarks encoding in this study, which can substantially enhance the fidelity of talking faces. Specifically, we leverage facial […]

Ver mais

Like 0

Liked Liked