March 2026

Whisper-MLA: Reducing GPU Memory Consumption of ASR Models based on MHA2MLA Conversion

digitado ⋅ 3 de March de 2026

arXiv:2603.00563v1 Announce Type: new Abstract: The Transformer-based Whisper model has achieved state-of-the-art performance in Automatic Speech Recognition (ASR). However, its Multi-Head Attention (MHA) mechanism results in significant GPU memory consumption due to the linearly growing Key-Value (KV) cache usage, which is problematic for many applications especially with long-form audio. To address this, we introduce Whisper-MLA, a novel architecture that incorporates Multi-Head Latent Attention (MLA) into the Whisper model. Specifically, we adapt MLA for Whisper’s absolute positional embeddings and […]

Ver mais

Like 0

Liked Liked

technocracy

Geometry OR Tracker: Universal Geometric Operating Room Tracking

digitado ⋅ 3 de March de 2026

arXiv:2603.00560v1 Announce Type: new Abstract: In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition, where physically meaningful quantities such as distances and motion statistics must be measured in meters. However, real clinical deployments rarely satisfy the geometric prerequisites for stable multi-view fusion and tracking: camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency that produces “ghosting” during fusion and degrades 3D trajectories in a shared OR coordinate […]

Ver mais

Like 0

Liked Liked

technocracy

Planning Method for Skill-Based Control of Robots Using a PLC as Skill Trigger

digitado ⋅ 3 de March de 2026

arXiv:2603.00555v1 Announce Type: new Abstract: Skill-based programming of robots provides a flexible approach for automation. Existing solutions neglect the optimization of motion sequences, leading to inefficiencies in execution. This work introduces a planning method that enhances skill-based robot programming by integrating motion sequence optimization. This optimization leads to a new MoveContinuousSkill. The software for executing the MoveContinuousSkill is implemented on a Programmable Logic Controller and applied across multiple robotic systems. Experimental results demonstrate a significant improvement in execution […]

Ver mais

Like 0

Liked Liked

technocracy

EMPA: Evaluating Persona-Aligned Empathy as a Process

digitado ⋅ 3 de March de 2026

arXiv:2603.00552v1 Announce Type: new Abstract: Evaluating persona-aligned empathy in LLM-based dialogue agents remains challenging. User states are latent, feedback is sparse and difficult to verify in situ, and seemingly supportive turns can still accumulate into trajectories that drift from persona-specific needs. We introduce EMPA, a process-oriented framework that evaluates persona-aligned support as sustained intervention rather than isolated replies. EMPA distills real interactions into controllable, psychologically grounded scenarios, couples them with an open-ended multi-agent sandbox that exposes strategic adaptation […]

Ver mais

Like 0

Liked Liked

technocracy

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

digitado ⋅ 3 de March de 2026

arXiv:2603.00551v1 Announce Type: new Abstract: GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive sampling with high errors or conservative sampling with constrained speedups. To address these issues, we propose GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs. By encoding instruction sequences and data […]

Ver mais

Like 0

Liked Liked

technocracy

Weakly Supervised Video Anomaly Detection with Anomaly-Connected Components and Intention Reasoning

digitado ⋅ 3 de March de 2026

arXiv:2603.00550v1 Announce Type: new Abstract: Weakly supervised video anomaly detection (WS-VAD) involves identifying the temporal intervals that contain anomalous events in untrimmed videos, where only video-level annotations are provided as supervisory signals. However, a key limitation persists in WS-VAD, as dense frame-level annotations are absent, which often leaves existing methods struggling to learn anomaly semantics effectively. To address this issue, we propose a novel framework named LAS-VAD, short for Learning Anomaly Semantics for WS-VAD, which integrates anomaly-connected component […]

Ver mais

Like 0

Liked Liked

technocracy

PM2Lat: Highly Accurate and Generalized Prediction of DNN Execution Latency on GPUs

digitado ⋅ 3 de March de 2026

arXiv:2603.00549v1 Announce Type: new Abstract: We present PM2Lat, a fast and generalized framework for accurately predicting the latency of deep neural network models on GPUs, with special focus on NVIDIA. Unlike prior methods that rely on deep learning models or handcrafted heuristics, PM2Lat leverages the Single-Instruction-Multiple-Thread architecture of GPUs to model execution time of DNN models. First, we dive into fine-grained GPU operation modeling by studying computational behavior and memory access patterns. After identifying these characteristics, we found […]

Ver mais

Like 0

Liked Liked

technocracy

Advancing Multimodal Judge Models through a Capability-Oriented Benchmark and MCTS-Driven Data Generation

digitado ⋅ 3 de March de 2026

arXiv:2603.00546v1 Announce Type: new Abstract: Using Multimodal Large Language Models (MLLMs) as judges to achieve precise and consistent evaluations has gradually become an emerging paradigm across various domains. Evaluating the capability and reliability of MLLM-as-a-judge systems is therefore essential for ensuring trustworthy assessment. Existing judge benchmarks categorize samples by task types but fail to capture the fundamental judgment capabilities required for reliable evaluation. In this work, we introduce M-JudgeBench, a ten-dimensional capability-oriented benchmark designed to comprehensively assess the […]

Ver mais

Like 0

Liked Liked

technocracy

Multiple Inputs and Mixwd data for Alzheimer’s Disease Classification Based on 3D Vision Transformer

digitado ⋅ 3 de March de 2026

arXiv:2603.00545v1 Announce Type: new Abstract: The current methods for diagnosing Alzheimer Disease using Magnetic Resonance Imaging (MRI) have significant limitations. Many previous studies used 2D Transformers to analyze individual brain slices independently, potentially losing critical 3D contextual information. Region of interest-based models often focus on only a few brain regions despite Alzheimer’s affecting multiple areas. Additionally, most classification models rely on a single test, whereas diagnosing Alzheimer’s requires a multifaceted approach integrating diverse data sources for a more […]

Ver mais

Like 0

Liked Liked

technocracy

On Best-Possible One-Time Programs

digitado ⋅ 3 de March de 2026

arXiv:2603.00544v1 Announce Type: new Abstract: One-time programs (OTPs) aim to let a user evaluate a program on a single input while revealing nothing else. Classical OTPs require hardware assumptions, and even with quantum information, OTPs for deterministic functionalities remain impossible due to gentle-measurement attacks (Broadbent, Gutoski and Stebila, 2013). While recent works achieve positive results for certain randomized functionalities, the fundamental limits and the strongest achievable security notions remain poorly understood. In this paper, we ask for a […]

Ver mais

Like 0

Liked Liked