digitado

About digitado

https://www.digitado.com.br

Posts by :

Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values

digitado ⋅ 4 de March de 2026

arXiv:2603.00945v2 Announce Type: replace-cross Abstract: We study non-rectangular robust Markov decision processes under the average-reward criterion, where the ambiguity set couples transition probabilities across states and the adversary commits to a stationary kernel for the entire horizon. We show that any history-dependent policy achieving sublinear expected regret uniformly over the ambiguity set is robust-optimal, and that the robust value admits a minimax representation as the infimum over the ambiguity set of the classical optimal gains, without requiring any […]

Ver mais

Like 0

Liked Liked

technocracy

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

digitado ⋅ 4 de March de 2026

arXiv:2602.16340v2 Announce Type: replace-cross Abstract: We study the implicit bias of momentum-based optimizers on homogeneous models. We first extend existing results on the implicit bias of steepest descent in homogeneous models to normalized steepest descent with an optional learning rate schedule. We then show that for smooth homogeneous models, momentum steepest descent algorithms like Muon (spectral norm), MomentumGD ($ell_2$ norm), and Signum ($ell_infty$ norm) are approximate steepest descent trajectories under a decaying learning rate schedule, proving that these […]

Ver mais

Like 0

Liked Liked

technocracy

Auditing Information Disclosure During LLM-Scale Gradient Descent Using Gradient Uniqueness

digitado ⋅ 4 de March de 2026

arXiv:2510.10902v2 Announce Type: replace-cross Abstract: Disclosing information via the publication of a machine learning model poses significant privacy risks. However, auditing this disclosure across every datapoint during the training of Large Language Models (LLMs) is computationally prohibitive. In this paper, we present Gradient Uniqueness (GNQ), a principled, attack-agnostic metric derived from an information-theoretic upper bound on the amount of information embedded in a model about individual training points via gradient descent. While naively computing GNQ requires forming and […]

Ver mais

Like 0

Liked Liked

technocracy

Mitigating Over-Refusal in Aligned Large Language Models via Inference-Time Activation Energy

digitado ⋅ 4 de March de 2026

arXiv:2510.08646v2 Announce Type: replace-cross Abstract: Safety alignment of large language models currently faces a central challenge: existing alignment techniques often prioritize mitigating responses to harmful prompts at the expense of overcautious behavior, leading models to incorrectly refuse benign requests. A key goal of safe alignment is therefore to improve safety while simultaneously minimizing false refusals. In this work, we introduce Energy Landscape Steering (ELS), a novel, fine-tuning free framework designed to resolve this challenge through dynamic, inference-time intervention. […]

Ver mais

Like 0

Liked Liked

technocracy

Characterizing the Multiclass Learnability of Forgiving 0-1 Loss Functions

digitado ⋅ 4 de March de 2026

arXiv:2510.08382v3 Announce Type: replace-cross Abstract: In this paper we will give a characterization of the learnability of forgiving 0-1 loss functions in the multiclass setting with effectively finite cardinality of the output and label space. To do this, we create a new combinatorial dimension that is based off of the Natarajan Dimension and we show that a hypothesis class is learnable in our setting if and only if this Generalized Natarajan Dimension is finite. We also show how […]

Ver mais

Like 0

Liked Liked

technocracy

Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

digitado ⋅ 4 de March de 2026

arXiv:2509.22613v2 Announce Type: replace-cross Abstract: Recent reinforcement learning (RL) methods have substantially enhanced the planning capabilities of Large Language Models (LLMs), yet the theoretical basis for their effectiveness remains elusive. In this work, we investigate RL’s benefits and limitations through a tractable graph-based abstraction, focusing on policy gradient (PG) and Q-learning methods. Our theoretical analyses reveal that supervised fine-tuning (SFT) may introduce co-occurrence-based spurious solutions, whereas RL achieves correct planning primarily through exploration, underscoring exploration’s role in enabling […]

Ver mais

Like 0

Liked Liked

technocracy

Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact

digitado ⋅ 4 de March de 2026

arXiv:2507.08965v2 Announce Type: replace-cross Abstract: Classifier-Free Guidance (CFG) is a widely used technique for conditional generation and improving sample quality in continuous diffusion models, and its extensions to discrete diffusion has recently started to be investigated. In order to improve the algorithms in a principled way, this paper starts by analyzing the exact effect of CFG in the context of a low-dimensional masked diffusion model, with a special emphasis on the guidance schedule. Our analysis shows that high […]

Ver mais

Like 0

Liked Liked

technocracy

RNE: plug-and-play diffusion inference-time control and energy-based training

digitado ⋅ 4 de March de 2026

arXiv:2506.05668v5 Announce Type: replace-cross Abstract: Diffusion models generate data by removing noise gradually, which corresponds to the time-reversal of a noising process. However, access to only the denoising kernels is often insufficient. In many applications, we need the knowledge of the marginal densities along the generation trajectory, which enables tasks such as inference-time control. To address this gap, in this paper, we introduce the Radon-Nikodym Estimator (RNE). Based on the concept of the textit{density ratio} between path distributions, […]

Ver mais

Like 0

Liked Liked

technocracy

Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme

digitado ⋅ 4 de March de 2026

arXiv:2506.01502v3 Announce Type: replace-cross Abstract: Learning population dynamics involves recovering the underlying process that governs particle evolution, given evolutionary snapshots of samples at discrete time points. Recent methods frame this as an energy minimization problem in probability space and leverage the celebrated JKO scheme for efficient time discretization. In this work, we introduce $texttt{iJKOnet}$, an approach that combines the JKO framework with inverse optimization techniques to learn population dynamics. Our method relies on a conventional $textit{end-to-end}$ adversarial training […]

Ver mais

Like 0

Liked Liked

technocracy

Optimizing Data Augmentation through Bayesian Model Selection

digitado ⋅ 4 de March de 2026

arXiv:2505.21813v2 Announce Type: replace-cross Abstract: Data Augmentation (DA) has become an essential tool to improve robustness and generalization of modern machine learning. However, when deciding on DA strategies it is critical to choose parameters carefully, and this can be a daunting task which is traditionally left to trial-and-error or expensive optimization based on validation performance. In this paper, we counter these limitations by proposing a novel framework for optimizing DA. In particular, we take a probabilistic view of […]

Ver mais

Like 0

Liked Liked