March 2026

ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping

digitado ⋅ 12 de March de 2026

arXiv:2603.10088v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) are emerging as a promising alternative to autoregressive models (ARMs) due to their ability to capture bidirectional context and the potential for parallel generation. Despite the advantages, dLLM inference remains computationally expensive as the full input context is processed at every iteration. In this work, we analyze the generation dynamics of dLLMs and find that intermediate representations, including key, value, and hidden states, change only subtly across successive […]

Ver mais

Like 0

Liked Liked

technocracy

Pooling Engram Conditional Memory in Large Language Models using CXL

digitado ⋅ 12 de March de 2026

arXiv:2603.10087v1 Announce Type: new Abstract: Engram conditional memory has emerged as a promising component for LLMs by decoupling static knowledge lookup from dynamic computation. Since Engram exhibits sparse access patterns and supports prefetching, its massive embedding tables are well-suited for offloading to lower-tier memory. In this paper, we propose using Compute Express Link (CXL) memory pool for Engram storage. Compared to RDMA, CXL provides fine-grained and low-latency access required by minimal and discrete retrieval patterns of Engram. We […]

Ver mais

Like 0

Liked Liked

technocracy

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

digitado ⋅ 12 de March de 2026

arXiv:2603.10085v1 Announce Type: new Abstract: Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task […]

Ver mais

Like 0

Liked Liked

technocracy

Digging Deeper: Learning Multi-Level Concept Hierarchies

digitado ⋅ 12 de March de 2026

arXiv:2603.10084v1 Announce Type: new Abstract: Although concept-based models promise interpretability by explaining predictions with human-understandable concepts, they typically rely on exhaustive annotations and treat concepts as flat and independent. To circumvent this, recent work has introduced Hierarchical Concept Embedding Models (HiCEMs) to explicitly model concept relationships, and Concept Splitting to discover sub-concepts using only coarse annotations. However, both HiCEMs and Concept Splitting are restricted to shallow hierarchies. We overcome this limitation with Multi-Level Concept Splitting (MLCS), which discovers […]

Ver mais

Like 0

Liked Liked

technocracy

Categorical Calculus and Algebra for Multi-Model Data

digitado ⋅ 12 de March de 2026

arXiv:2603.10081v1 Announce Type: new Abstract: Multi-model databases are designed to store, manage, and query data in various models, such as relational, hierarchical, and graph data, simultaneously. In this paper, we provide a theoretical basis for querying categorical databases. We propose two formal query languages: categorical calculus and categorical algebra, by extending relational calculus and relational algebra respectively. We demonstrate the equivalence between these two languages of queries. We propose a series of transformation rules of categorical algebra to […]

Ver mais

Like 0

Liked Liked

technocracy

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

digitado ⋅ 12 de March de 2026

arXiv:2603.10080v1 Announce Type: new Abstract: Warning: This article includes red-teaming experiments, which contain examples of compromised LLM responses that may be offensive or upsetting. Large Language Models (LLMs) have the potential to create harmful content, such as generating sophisticated phishing emails and assisting in writing code of harmful computer viruses. Thus, it is crucial to ensure their safe and responsible response generation. To reduce the risk of generating harmful or irresponsible content, researchers have developed techniques such as […]

Ver mais

Like 0

Liked Liked

technocracy

Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

digitado ⋅ 12 de March de 2026

arXiv:2603.10079v1 Announce Type: new Abstract: We analyse SGD training of a shallow, fully connected network in the NTK scaling and provide a quantitative theory of the catapult phase. We identify an explicit criterion separating two behaviours: When an explicit function $G$, depending only on the kernel, learning rate $eta$ and data, is positive, SGD produces large NTK-flattening spikes with high probability; when $G<0$, their probability decays like $(n/eta)^{-vartheta/2}$, for an explicitly characterised $varthetain (0,infty)$. This yields a concrete […]

Ver mais

Like 0

Liked Liked

technocracy

Stochastic Port-Hamiltonian Neural Networks: Universal Approximation with Passivity Guarantees

digitado ⋅ 12 de March de 2026

arXiv:2603.10078v1 Announce Type: new Abstract: Stochastic port-Hamiltonian systems represent open dynamical systems with dissipation, inputs, and stochastic forcing in an energy based form. We introduce stochastic port-Hamiltonian neural networks, SPH-NNs, which parameterize the Hamiltonian with a feedforward network and enforce skew symmetry of the interconnection matrix and positive semidefiniteness of the dissipation matrix. For It^o dynamics we establish a weak passivity inequality in expectation under an explicit generator condition, stated for a stopped process on a compact set. […]

Ver mais

Like 0

Liked Liked

technocracy

TASER: Task-Aware Spectral Energy Refine for Backdoor Suppression in UAV Swarms Decentralized Federated Learning

digitado ⋅ 12 de March de 2026

arXiv:2603.10075v1 Announce Type: new Abstract: As backdoor attacks in UAV-based decentralized federated learning (DFL) grow increasingly stealthy and sophisticated, existing defenses have likewise escalated in complexity. Yet these defenses, which rely heavily on outlier detection, remain vulnerable to carefully crafted backdoors. In UAV-DFL, the lack of global coordination and limited resources further render outlier-based defenses impractical. Against this backdrop, gradient spectral analysis offers a promising alternative. While prior work primarily leverages low-frequency coefficients for pairwise comparisons, it neglects […]

Ver mais

Like 0

Liked Liked

technocracy

Marginals Before Conditionals

digitado ⋅ 12 de March de 2026

arXiv:2603.10074v1 Announce Type: new Abstract: We construct a minimal task that isolates conditional learning in neural networks: a surjective map with K-fold ambiguity, resolved by a selector token z, so H(A | B) = log K while H(A | B, z) = 0. The model learns the marginal P(A | B) first, producing a plateau at exactly log K, before acquiring the full conditional in a sharp, collective transition. The plateau has a clean decomposition: height = log […]

Ver mais

Like 0

Liked Liked