February 2026

Minimax Rates for Learning Pairwise Interactions in Attention-Style Models

digitado ⋅ 26 de February de 2026

arXiv:2510.11789v2 Announce Type: replace Abstract: We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a nonlinear activation function. We prove that the minimax rate is $M^{-frac{2beta}{2beta+1}}$, where $M$ is the sample size and $beta$ is the H”older smoothness of the activation function. Importantly, this rate is independent of the embedding dimension $d$, the number of tokens $N$, and the rank $r$ of the weight matrix, provided […]

Ver mais

Like 0

Liked Liked

technocracy

Overparameterized Multiple Linear Regression as Hyper-Curve Fitting

digitado ⋅ 26 de February de 2026

arXiv:2404.07849v2 Announce Type: replace Abstract: This work demonstrates that applying a fixed-effect multiple linear regression (MLR) model to an overparameterized dataset is mathematically equivalent to fitting a hyper-curve parameterized by a single scalar. This reformulation shifts the focus from global coefficients to individual predictors, allowing each to be modeled as a function of a common parameter. We prove that this overparameterized linear framework can yield exact predictions even when the underlying data contains nonlinear dependencies that violate classical […]

Ver mais

Like 0

Liked Liked

technocracy

Coarsening Bias from Variable Discretization in Causal Functionals

digitado ⋅ 26 de February de 2026

arXiv:2602.22083v1 Announce Type: cross Abstract: A class of causal effect functionals requires integration over conditional densities of continuous variables, as in mediation effects and nonparametric identification in causal graphical models. Estimating such densities and evaluating the resulting integrals can be statistically and computationally demanding. A common workaround is to discretize the variable and replace integrals with finite sums. Although convenient, discretization alters the population-level functional and can induce non-negligible approximation bias, even under correct identification. Under smoothness conditions, […]

Ver mais

Like 0

Liked Liked

technocracy

Neural solver for Wasserstein Geodesics and optimal transport dynamics

digitado ⋅ 26 de February de 2026

arXiv:2602.22003v1 Announce Type: cross Abstract: In recent years, the machine learning community has increasingly embraced the optimal transport (OT) framework for modeling distributional relationships. In this work, we introduce a sample-based neural solver for computing the Wasserstein geodesic between a source and target distribution, along with the associated velocity field. Building on the dynamical formulation of the optimal transport (OT) problem, we recast the constrained optimization as a minimax problem, using deep neural networks to approximate the relevant […]

Ver mais

Like 0

Liked Liked

technocracy

Bayesian Generative Adversarial Networks via Gaussian Approximation for Tabular Data Synthesis

digitado ⋅ 26 de February de 2026

arXiv:2602.21948v1 Announce Type: cross Abstract: Generative Adversarial Networks (GAN) have been used in many studies to synthesise mixed tabular data. Conditional tabular GAN (CTGAN) have been the most popular variant but struggle to effectively navigate the risk-utility trade-off. Bayesian GAN have received less attention for tabular data, but have been explored with unstructured data such as images and text. The most used technique employed in Bayesian GAN is Markov Chain Monte Carlo (MCMC), but it is computationally intensive, […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Unknown Interdependencies for Decentralized Root Cause Analysis in Nonlinear Dynamical Systems

digitado ⋅ 26 de February de 2026

arXiv:2602.21928v1 Announce Type: cross Abstract: Root cause analysis (RCA) in networked industrial systems, such as supply chains and power networks, is notoriously difficult due to unknown and dynamically evolving interdependencies among geographically distributed clients. These clients represent heterogeneous physical processes and industrial assets equipped with sensors that generate large volumes of nonlinear, high-dimensional, and heterogeneous IoT data. Classical RCA methods require partial or full knowledge of the system’s dependency graph, which is rarely available in these complex networks. […]

Ver mais

Like 0

Liked Liked

technocracy

Generalisation of RLHF under Reward Shift and Clipped KL Regularisation

digitado ⋅ 26 de February de 2026

arXiv:2602.21765v1 Announce Type: cross Abstract: Alignment and adaptation in large language models heavily rely on reinforcement learning from human feedback (RLHF); yet, theoretical understanding of its generalisability remains premature, especially when the learned reward could shift, and the KL control is estimated and clipped. To address this issue, we develop generalisation theory for RLHF that explicitly accounts for (1) emph{reward shift}: reward models are trained on preference data from earlier or mixed behaviour policies while RLHF optimises the […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Complex Physical Regimes via Coverage-oriented Uncertainty Quantification: An application to the Critical Heat Flux

digitado ⋅ 26 de February de 2026

arXiv:2602.21701v1 Announce Type: cross Abstract: A central challenge in scientific machine learning (ML) is the correct representation of physical systems governed by multi-regime behaviours. In these scenarios, standard data analysis techniques often fail to capture the nature of the data, as the system’s response varies significantly across the state space due to its stochasticity and the different physical regimes. Uncertainty quantification (UQ) should thus not be viewed merely as a safety assessment, but as a support to the […]

Ver mais

Like 0

Liked Liked

technocracy

How many asymmetric communities are there in multi-layer directed networks?

digitado ⋅ 26 de February de 2026

arXiv:2602.21569v1 Announce Type: cross Abstract: Estimating the asymmetric numbers of communities in multi-layer directed networks is a challenging problem due to the multi-layer structures and inherent directional asymmetry, leading to possibly different numbers of sender and receiver communities. This work addresses this issue under the multi-layer stochastic co-block model, a model for multi-layer directed networks with distinct community structures in sending and receiving sides, by proposing a novel goodness-of-fit test. The test statistic relies on the deviation of […]

Ver mais

Like 0

Liked Liked

technocracy

Effects of Training Data Quality on Classifier Performance

digitado ⋅ 26 de February de 2026

arXiv:2602.21462v1 Announce Type: cross Abstract: We describe extensive numerical experiments assessing and quantifying how classifier performance depends on the quality of the training data, a frequently neglected component of the analysis of classifiers. More specifically, in the scientific context of metagenomic assembly of short DNA reads into “contigs,” we examine the effects of degrading the quality of the training data by multiple mechanisms, and for four classifiers — Bayes classifiers, neural nets, partition models and random forests. We […]

Ver mais

Like 0

Liked Liked