February 2026

Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules

digitado ⋅ 17 de February de 2026

arXiv:2509.19189v4 Announce Type: replace-cross Abstract: Scaling laws have emerged as a unifying lens for understanding and guiding the training of large language models (LLMs). However, existing studies predominantly focus on the final-step loss, leaving open whether the entire loss dynamics obey similar laws and, crucially, how the learning rate schedule (LRS) shapes them. We address these gaps in a controlled theoretical setting by analyzing stochastic gradient descent (SGD) on a power-law kernel regression model. The key insight is […]

Ver mais

Like 0

Liked Liked

technocracy

ART: Adaptive Resampling-based Training for Imbalanced Classification

digitado ⋅ 17 de February de 2026

arXiv:2509.00955v2 Announce Type: replace-cross Abstract: Traditional resampling methods for handling class imbalance typically uses fixed distributions, undersampling the majority or oversampling the minority. These static strategies ignore changes in class-wise learning difficulty, which can limit the overall performance of the model. This paper proposes an Adaptive Resampling-based Training (ART) method that periodically updates the distribution of the training data based on the class-wise performance of the model. Specifically, ART uses class-wise macro F1 scores, computed at fixed intervals, […]

Ver mais

Like 0

Liked Liked

technocracy

The Serial Scaling Hypothesis

digitado ⋅ 17 de February de 2026

arXiv:2507.12549v3 Announce Type: replace-cross Abstract: While machine learning has advanced through massive parallelization, we identify a critical blind spot: some problems are fundamentally sequential. These “inherently serial” problems-from mathematical reasoning to physical simulations to sequential decision-making-require sequentially dependent computational steps that cannot be efficiently parallelized. We formalize this distinction in complexity theory, and demonstrate that current parallel-centric architectures face fundamental limitations on such tasks. Then, we show for first time that diffusion models despite their sequential nature are […]

Ver mais

Like 0

Liked Liked

technocracy

wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models

digitado ⋅ 17 de February de 2026

arXiv:2507.08838v2 Announce Type: replace-cross Abstract: Improving the reasoning capabilities of diffusion-based large language models (dLLMs) through reinforcement learning (RL) remains an open problem. The intractability of dLLMs likelihood function necessitates approximating the current, old, and reference policy likelihoods at each policy optimization step. This reliance introduces additional computational overhead, and can lead to large variance and estimation error in RL objective — particularly in computing the policy ratio for importance sampling. To mitigate these issues, we introduce wd1, […]

Ver mais

Like 0

Liked Liked

technocracy

Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

digitado ⋅ 17 de February de 2026

arXiv:2506.13593v5 Announce Type: replace-cross Abstract: We introduce time-to-unsafe-sampling, a novel safety measure for generative models, defined as the number of generations required by a large language model (LLM) to trigger an unsafe (e.g., toxic) response. While providing a new dimension for prompt-adaptive safety evaluation, quantifying time-to-unsafe-sampling is challenging: unsafe outputs are often rare in well-aligned models and thus may not be observed under any feasible sampling budget. To address this challenge, we frame this estimation problem as one […]

Ver mais

Like 0

Liked Liked

technocracy

Variational Transdimensional Inference

digitado ⋅ 17 de February de 2026

arXiv:2506.04749v3 Announce Type: replace-cross Abstract: The expressiveness of flow-based models combined with stochastic variational inference (SVI) has expanded the application of optimization-based Bayesian inference to highly complex problems. However, despite the importance of multi-model Bayesian inference for problems defined on a transdimensional joint model and parameter space, such as Bayesian structure learning and model selection, flow-based SVI has been limited to problems defined on a fixed-dimensional parameter space. We introduce CoSMIC, normalizing flows (COntextually-Specified Masking for Identity-mapped Components), […]

Ver mais

Like 0

Liked Liked

technocracy

Are Statistical Methods Obsolete in the Era of Deep Learning? A Study of ODE Inverse Problems

digitado ⋅ 17 de February de 2026

arXiv:2505.21723v2 Announce Type: replace-cross Abstract: In the era of AI, neural networks have become increasingly popular for modeling, inference, and prediction, largely due to their potential for universal approximation. With the proliferation of such deep learning models, a question arises: are leaner statistical methods still relevant? To shed insight on this question, we employ the mechanistic nonlinear ordinary differential equation (ODE) inverse problem as a testbed, using the physics-informed neural network (PINN) as a representative of the deep […]

Ver mais

Like 0

Liked Liked

technocracy

On the Relation between Rectified Flows and Optimal Transport

digitado ⋅ 17 de February de 2026

arXiv:2505.19712v3 Announce Type: replace-cross Abstract: This paper investigates the connections between rectified flows, flow matching, and optimal transport. Flow matching is a recent approach to learning generative models by estimating velocity fields that guide transformations from a source to a target distribution. Rectified flow matching aims to straighten the learned transport paths, yielding more direct flows between distributions. Our first contribution is a set of invariance properties of rectified flows and explicit velocity fields. In addition, we also […]

Ver mais

Like 0

Liked Liked

technocracy

Residual Feature Integration is Sufficient to Prevent Negative Transfer

digitado ⋅ 17 de February de 2026

arXiv:2505.11771v2 Announce Type: replace-cross Abstract: Transfer learning has become a central paradigm in modern machine learning, yet it suffers from the long-standing problem of negative transfer, where leveraging source representations can harm rather than help performance on the target task. Although empirical remedies have been proposed, there remains little theoretical understanding of how to reliably avoid negative transfer. In this paper, we investigate a simple yet remarkably effective strategy: augmenting frozen, pretrained source-side features with a trainable target-side […]

Ver mais

Like 0

Liked Liked

technocracy

Rolled Gaussian process models for curves on manifolds

digitado ⋅ 17 de February de 2026

arXiv:2503.21980v2 Announce Type: replace-cross Abstract: Given a planar curve, imagine rolling a sphere along that curve without slipping or twisting, and by this means tracing out a curve on the sphere. It is well known that such a rolling operation induces a local isometry between the sphere and the plane so that the two curves uniquely determine each other, and moreover, the operation extends to a general class of manifolds in any dimension. We use rolling to construct […]

Ver mais

Like 0

Liked Liked