February 2026

Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits

digitado ⋅ 27 de February de 2026

arXiv:2502.06051v3 Announce Type: replace-cross Abstract: Many offline reinforcement learning algorithms are underpinned by $f$-divergence regularization, but their sample complexity *defined with respect to regularized objectives* still lacks tight analyses, especially in terms of concrete data coverage conditions. In this paper, we study the exact concentrability requirements to achieve the $tilde{Theta}(epsilon^{-1})$ sample complexity for offline $f$-divergence-regularized contextual bandits. For reverse Kullback-Leibler (KL) divergence, arguably the most commonly used one, we achieve an $tilde{O}(epsilon^{-1})$ sample complexity under single-policy concentrability for […]

Ver mais

Like 0

Liked Liked

technocracy

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

digitado ⋅ 27 de February de 2026

arXiv:2602.21160v2 Announce Type: replace Abstract: In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model’s ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=sigma_k^{2}/(2mu_k)$, with $mu_k{=}mathbb{E}[p_k]$ and $sigma_k^2{=}mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/mu_k$ weighting corrects boundary suppression and makes $C_k$ comparable […]

Ver mais

Like 0

Liked Liked

technocracy

Scaling Laws for Precision in High-Dimensional Linear Regression

digitado ⋅ 27 de February de 2026

arXiv:2602.19241v2 Announce Type: replace Abstract: Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear […]

Ver mais

Like 0

Liked Liked

technocracy

One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow

digitado ⋅ 27 de February de 2026

arXiv:2512.05251v2 Announce Type: replace Abstract: Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps to produce high-quality samples, leading to high computational costs. We introduce one-step diffusion samplers which learn a step-conditioned ODE so that one large step reproduces the trajectory of many small ones via a state-space consistency loss. We further show that standard ELBO estimates in diffusion samplers degrade in the […]

Ver mais

Like 0

Liked Liked

technocracy

On the Interpolation Error of Nonlinear Attention versus Linear Regression

digitado ⋅ 27 de February de 2026

arXiv:2506.18656v2 Announce Type: replace Abstract: Attention has become the core building block of modern machine learning (ML) by efficiently capturing the long-range dependencies among input tokens. Its inherently parallelizable structure allows for efficient performance scaling with the rapidly increasing size of both data and model parameters. Despite its central role, the theoretical understanding of Attention, especially in the nonlinear setting, is progressing at a more modest pace. This paper provides a precise characterization of the interpolation error for […]

Ver mais

Like 0

Liked Liked

technocracy

Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

digitado ⋅ 27 de February de 2026

arXiv:2405.06727v2 Announce Type: replace Abstract: In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that […]

Ver mais

Like 0

Liked Liked

technocracy

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

digitado ⋅ 27 de February de 2026

arXiv:2602.23341v1 Announce Type: cross Abstract: Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse […]

Ver mais

Like 0

Liked Liked

technocracy

Differentiable Zero-One Loss via Hypersimplex Projections

digitado ⋅ 27 de February de 2026

arXiv:2602.23336v1 Announce Type: cross Abstract: Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading […]

Ver mais

Like 0

Liked Liked

technocracy

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

digitado ⋅ 27 de February de 2026

arXiv:2602.23197v1 Announce Type: cross Abstract: Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how […]

Ver mais

Like 0

Liked Liked

technocracy

Low-degree Lower bounds for clustering in moderate dimension

digitado ⋅ 27 de February de 2026

arXiv:2602.23023v1 Announce Type: cross Abstract: We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Delta$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Delta$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n leq […]

Ver mais

Like 0

Liked Liked