February 2026

Scaling Laws for Precision in High-Dimensional Linear Regression

digitado ⋅ 27 de February de 2026

arXiv:2602.19241v2 Announce Type: replace Abstract: Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear […]

Ver mais

Like 0

Liked Liked

technocracy

One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow

digitado ⋅ 27 de February de 2026

arXiv:2512.05251v2 Announce Type: replace Abstract: Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps to produce high-quality samples, leading to high computational costs. We introduce one-step diffusion samplers which learn a step-conditioned ODE so that one large step reproduces the trajectory of many small ones via a state-space consistency loss. We further show that standard ELBO estimates in diffusion samplers degrade in the […]

Ver mais

Like 0

Liked Liked

technocracy

On the Interpolation Error of Nonlinear Attention versus Linear Regression

digitado ⋅ 27 de February de 2026

arXiv:2506.18656v2 Announce Type: replace Abstract: Attention has become the core building block of modern machine learning (ML) by efficiently capturing the long-range dependencies among input tokens. Its inherently parallelizable structure allows for efficient performance scaling with the rapidly increasing size of both data and model parameters. Despite its central role, the theoretical understanding of Attention, especially in the nonlinear setting, is progressing at a more modest pace. This paper provides a precise characterization of the interpolation error for […]

Ver mais

Like 0

Liked Liked

technocracy

Representative, Informative, and De-Amplifying: Requirements for Robust Bayesian Active Learning under Model Misspecification

digitado ⋅ 27 de February de 2026

arXiv:2506.07805v2 Announce Type: replace Abstract: In many settings in science and industry, such as drug discovery and clinical trials, a central challenge is designing experiments under time and budget constraints. Bayesian Optimal Experimental Design (BOED) is a paradigm to pick maximally informative designs that has been increasingly applied to such problems. During training, BOED selects inputs according to a pre-determined acquisition criterion to target informativeness. During testing, the model learned during training encounters a naturally occurring distribution of […]

Ver mais

Like 0

Liked Liked

technocracy

Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

digitado ⋅ 27 de February de 2026

arXiv:2405.06727v2 Announce Type: replace Abstract: In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that […]

Ver mais

Like 0

Liked Liked

technocracy

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

digitado ⋅ 27 de February de 2026

arXiv:2602.23341v1 Announce Type: cross Abstract: Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse […]

Ver mais

Like 0

Liked Liked

technocracy

Differentiable Zero-One Loss via Hypersimplex Projections

digitado ⋅ 27 de February de 2026

arXiv:2602.23336v1 Announce Type: cross Abstract: Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading […]

Ver mais

Like 0

Liked Liked

technocracy

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

digitado ⋅ 27 de February de 2026

arXiv:2602.23197v1 Announce Type: cross Abstract: Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how […]

Ver mais

Like 0

Liked Liked

technocracy

Low-degree Lower bounds for clustering in moderate dimension

digitado ⋅ 27 de February de 2026

arXiv:2602.23023v1 Announce Type: cross Abstract: We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Delta$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Delta$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n leq […]

Ver mais

Like 0

Liked Liked

technocracy

An automatic counting algorithm for the quantification and uncertainty analysis of the number of microglial cells trainable in small and heterogeneous datasets

digitado ⋅ 27 de February de 2026

arXiv:2602.22974v1 Announce Type: cross Abstract: Counting immunopositive cells on biological tissues generally requires either manual annotation or (when available) automatic rough systems, for scanning signal surface and intensity in whole slide imaging. In this work, we tackle the problem of counting microglial cells in lumbar spinal cord cross-sections of rats by omitting cell detection and focusing only on the counting task. Manual cell counting is, however, a time-consuming task and additionally entails extensive personnel training. The classic automatic […]

Ver mais

Like 0

Liked Liked