March 2026

ActTail: Global Activation Sparsity in Large Language Models

digitado ⋅ 16 de March de 2026

arXiv:2603.12272v1 Announce Type: new Abstract: Activation sparsity is a promising approach for accelerating large language model (LLM) inference by reducing computation and memory movement. However, existing activation sparsity methods typically apply uniform sparsity across projections, ignoring the heterogeneous statistical properties of Transformer weights and thereby amplifying performance degradation. In this paper, we propose ActTail, a TopK magnitude-based activation sparsity method with global activation sparsity allocation grounded in Heavy-Tailed Self-Regularization (HT-SR) theory. Specifically, we capture this heterogeneity via the […]

Ver mais

Like 0

Liked Liked

technocracy

Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models

digitado ⋅ 16 de March de 2026

arXiv:2603.12271v1 Announce Type: new Abstract: LLMs are widely used in knowledge-intensive tasks where the same fact may be revised multiple times within context. Unlike prior work focusing on one-shot updates or single conflicts, multi-update scenarios contain multiple historically valid versions that compete at retrieval, yet remain underexplored. This challenge resembles the AB-AC interference paradigm in cognitive psychology: when the same cue A is successively associated with B and C, the old and new associations compete during retrieval, leading […]

Ver mais

Like 0

Liked Liked

technocracy

Task-Specific Knowledge Distillation via Intermediate Probes

digitado ⋅ 16 de March de 2026

arXiv:2603.12270v1 Announce Type: new Abstract: Knowledge distillation from large language models (LLMs) assumes that the teacher’s output distribution is a high-quality training signal. On reasoning tasks, this assumption is frequently violated. A model’s intermediate representations may encode the correct answer, yet this information is lost or distorted through the vocabulary projection, where prompt formatting and answer-token choices creates brittle, noisy outputs. We introduce method{}, a distillation framework that bypasses this bottleneck by training lightweight probes on frozen teacher […]

Ver mais

Like 0

Liked Liked

technocracy

DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs

digitado ⋅ 16 de March de 2026

arXiv:2603.12269v1 Announce Type: new Abstract: Early-exit deep neural networks enable adaptive inference by terminating computation when sufficient confidence is achieved, reducing cost for edge AI accelerators in resource-constrained settings. Existing methods, however, rely on suboptimal exit policies, ignore input difficulty, and optimize thresholds independently. This paper introduces DART (Input-Difficulty-Aware Adaptive Threshold), a framework that overcomes these limitations. DART introduces three key innovations: (1) a lightweight difficulty estimation module that quantifies input complexity with minimal computational overhead, (2) a […]

Ver mais

Like 0

Liked Liked

technocracy

A Holistic Framework for Automated Configuration Recommendation for Cloud Service Monitoring

digitado ⋅ 16 de March de 2026

arXiv:2603.12268v1 Announce Type: new Abstract: Reliability of large-scale cloud services is critical for user satisfaction and business continuity. Despite significant investments in reliability engineering, production incidents remain inevitable, often leading to customer impact and operational overhead. In large cloud companies, multiple services are deployed across regions necessitating robust health monitoring systems. However, the current monitor configuration process is manual, largely reactive and ad hoc, resulting in gaps in coverage and redundant alerts. In this paper, we present a […]

Ver mais

Like 0

Liked Liked

technocracy

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

digitado ⋅ 16 de March de 2026

Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can significantly reduce memory footprints, it typically suffers from prohibitively degraded convergence speed. To resolve this dilemma, we propose Hybrid-Order Split Federated Learning (HO-SFL). By reformulating the split learning process within a Lagrangian framework, HO-SFL decouples the optimization landscape: The server performs precise first-order updates (i.e., BP), whereas clients […]

Ver mais

Like 0

Liked Liked

technocracy

Understanding the geometry of deep learning with decision boundary volume

digitado ⋅ 16 de March de 2026

For classification tasks, the performance of a deep neural network is determined by the structure of its decision boundary, whose geometry directly affects essential properties of the model, including accuracy and robustness. Motivated by a classical tube formula due to Weyl, we introduce a method to measure the decision boundary of a neural network through local surface volumes, providing a theoretically justifiable and efficient measure enabling a geometric interpretation of the effectiveness of the model applicable to the […]

Ver mais

Like 0

Liked Liked

technocracy

Online Learning for Supervisory Switching Control

digitado ⋅ 16 de March de 2026

We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy the best controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such […]

Ver mais

Like 0

Liked Liked

technocracy

Gauge-Equivariant Intrinsic Neural Operators for Geometry-Consistent Learning of Elliptic PDE Maps

digitado ⋅ 16 de March de 2026

Learning solution operators of partial differential equations (PDEs) from data has emerged as a promising route to fast surrogate models in multi-query scientific workflows. However, for geometric PDEs whose inputs and outputs transform under changes of local frame (gauge), many existing operator-learning architectures remain representation-dependent, brittle under metric perturbations, and sensitive to discretization changes. We propose Gauge-Equivariant Intrinsic Neural Operators (GINO), a class of neural operators that parameterize elliptic solution maps primarily through intrinsic spectral multipliers acting on […]

Ver mais

Like 0

Liked Liked

technocracy

DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning

digitado ⋅ 16 de March de 2026

Next-generation IoT applications increasingly span across autonomous administrative entities, necessitating silo-cooperative scheduling to leverage diverse computational resources while preserving data privacy. However, realizing efficient cooperation faces significant challenges arising from infrastructure heterogeneity, Non-IID workload shifts, and the inherent risks of adversarial environments. Existing approaches, relying predominantly on centralized coordination or independent learning, fail to address the incompatibility of state-action spaces across heterogeneous silos and lack robustness against malicious attacks. This paper proposes DeFRiS, a Decentralized Federated Reinforcement Learning […]

Ver mais

Like 0

Liked Liked