January 2026

Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

digitado ⋅ 28 de January de 2026

arXiv:2601.18908v1 Announce Type: new Abstract: Speech Emotion Recognition systems often use static features like Mel-Frequency Cepstral Coefficients (MFCCs), Zero Crossing Rate (ZCR), and Root Mean Square Energy (RMSE). Because of this, they can misclassify emotions when there is acoustic noise in vocal signals. To address this, we added dynamic features using Dynamic Spectral features (Deltas and Delta-Deltas) along with the Kalman Smoothing algorithm. This approach reduces noise and improves emotion classification. Since emotion changes over time, the Kalman […]

Ver mais

Like 0

Liked Liked

technocracy

SICL-AT: Another way to adapt Auditory LLM to low-resource task

digitado ⋅ 28 de January de 2026

arXiv:2601.18904v1 Announce Type: new Abstract: Auditory Large Language Models (LLMs) have demonstrated strong performance across a wide range of speech and audio understanding tasks. Nevertheless, they often struggle when applied to low-resource or unfamiliar tasks. In case of labeled in-domain data is scarce or mismatched to the true test distribution, direct fine-tuning can be brittle. In-Context Learning (ICL) provides a training-free, inference-time solution by adapting auditory LLMs through conditioning on a few in-domain demonstrations. In this work, we […]

Ver mais

Like 0

Liked Liked

technocracy

Flatter Tokens are More Valuable for Speculative Draft Model Training

digitado ⋅ 28 de January de 2026

arXiv:2601.18902v1 Announce Type: new Abstract: Speculative Decoding (SD) is a key technique for accelerating Large Language Model (LLM) inference, but it typically requires training a draft model on a large dataset. We approach this problem from a data-centric perspective, finding that not all training samples contribute equally to the SD acceptance rate. Specifically, our theoretical analysis and empirical validation reveals that tokens inducing flatter predictive distributions from the target model are more valuable than those yielding sharply peaked […]

Ver mais

Like 0

Liked Liked

technocracy

Self-Aware Knowledge Probing: Evaluating Language Models’ Relational Knowledge through Confidence Calibration

digitado ⋅ 28 de January de 2026

arXiv:2601.18901v1 Announce Type: new Abstract: Knowledge probing quantifies how much relational knowledge a language model (LM) has acquired during pre-training. Existing knowledge probes evaluate model capabilities through metrics like prediction accuracy and precision. Such evaluations fail to account for the model’s reliability, reflected in the calibration of its confidence scores. In this paper, we propose a novel calibration probing framework for relational knowledge, covering three modalities of model confidence: (1) intrinsic confidence, (2) structural consistency and (3) semantic […]

Ver mais

Like 0

Liked Liked

technocracy

Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries

digitado ⋅ 28 de January de 2026

arXiv:2601.18899v1 Announce Type: new Abstract: Large Language Model (LLM)-powered Automatic Speech Recognition (ASR) systems achieve strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. Prior work trains a separate connector per language, overlooking linguistic relatedness. We propose an efficient and novel connector-sharing strategy based on linguistic family membership, enabling one connector per family, and empirically validate its effectiveness across two multilingual LLMs and two real-world corpora spanning curated […]

Ver mais

Like 0

Liked Liked

technocracy

Explainable Uncertainty Quantification for Wastewater Treatment Energy Prediction via Interval Type-2 Neuro-Fuzzy System

digitado ⋅ 28 de January de 2026

arXiv:2601.18897v1 Announce Type: new Abstract: Wastewater treatment plants consume 1-3% of global electricity, making accurate energy forecasting critical for operational optimization and sustainability. While machine learning models provide point predictions, they lack explainable uncertainty quantification essential for risk-aware decision-making in safety-critical infrastructure. This study develops an Interval Type-2 Adaptive Neuro-Fuzzy Inference System (IT2-ANFIS) that generates interpretable prediction intervals through fuzzy rule structures. Unlike black-box probabilistic methods, the proposed framework decomposes uncertainty across three levels: feature-level, footprint of uncertainty […]

Ver mais

Like 0

Liked Liked

technocracy

Weakly supervised framework for wildlife detection and counting in challenging Arctic environments: a case study on caribou (Rangifer tarandus)

digitado ⋅ 28 de January de 2026

arXiv:2601.18891v1 Announce Type: new Abstract: Caribou across the Arctic has declined in recent decades, motivating scalable and accurate monitoring approaches to guide evidence-based conservation actions and policy decisions. Manual interpretation from this imagery is labor-intensive and error-prone, underscoring the need for automatic and reliable detection across varying scenes. Yet, such automatic detection is challenging due to severe background heterogeneity, dominant empty terrain (class imbalance), small or occluded targets, and wide variation in density and scale. To make the […]

Ver mais

Like 0

Liked Liked

technocracy

XProvence: Zero-Cost Multilingual Context Pruning for Retrieval-Augmented Generation

digitado ⋅ 28 de January de 2026

arXiv:2601.18886v1 Announce Type: new Abstract: This paper introduces XProvence, a multilingual zero-cost context pruning model for retrieval-augmented generation (RAG), trained on 16 languages and supporting 100+ languages through effective cross-lingual transfer. Motivated by the growing use of RAG systems across diverse languages, we explore several strategies to generalize the Provence framework-which first integrated efficient zero-cost context pruning directly into the re-ranking model-beyond English. Across four multilingual question answering benchmarks, we show how XProvence can prune RAG contexts with […]

Ver mais

Like 0

Liked Liked

technocracy

Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer Language Model

digitado ⋅ 28 de January de 2026

arXiv:2601.18858v1 Announce Type: new Abstract: Compositional generalization-the ability to interpret novel combinations of familiar components-remains a persistent challenge for neural networks. Behavioral evaluations reveal when models fail but offer limited insight into why failures arise at the representational level. We introduce Homomorphism Error (HE), a structural metric that quantifies deviations from approximate homomorphisms between the expression algebra and a model’s hidden-state space. We instantiate HE for two compositional operators in SCAN-style tasks: modifier HE for unary composition and […]

Ver mais

Like 0

Liked Liked

technocracy

SelfieAvatar: Real-time Head Avatar reenactment from a Selfie Video

digitado ⋅ 28 de January de 2026

arXiv:2601.18851v1 Announce Type: new Abstract: Head avatar reenactment focuses on creating animatable personal avatars from monocular videos, serving as a foundational element for applications like social signal understanding, gaming, human-machine interaction, and computer vision. Recent advances in 3D Morphable Model (3DMM)-based facial reconstruction methods have achieved remarkable high-fidelity face estimation. However, on the one hand, they struggle to capture the entire head, including non-facial regions and background details in real time, which is an essential aspect for producing […]

Ver mais

Like 0

Liked Liked