March 2026

YuriiFormer: A Suite of Nesterov-Accelerated Transformers

digitado ⋅ 6 de March de 2026

arXiv:2601.23236v2 Announce Type: replace-cross Abstract: We propose a variational framework that interprets transformer layers as iterations of an optimization algorithm acting on token embeddings. In this view, self-attention implements a gradient step of an interaction energy, while MLP layers correspond to gradient updates of a potential energy. Standard GPT-style transformers emerge as vanilla gradient descent on the resulting composite objective, implemented via Lie–Trotter splitting between these two energy functionals. This perspective enables principled architectural design using classical optimization […]

Ver mais

Like 0

Liked Liked

technocracy

Towards Sharp Minimax Risk Bounds for Operator Learning

digitado ⋅ 6 de March de 2026

arXiv:2512.17805v2 Announce Type: replace-cross Abstract: We develop a minimax theory for operator learning, where the goal is to estimate an unknown operator between separable Hilbert spaces from finitely many noisy input-output samples. For uniformly bounded Lipschitz operators, we prove information-theoretic lower bounds together with matching or near-matching upper bounds, covering both fixed and random designs under Hilbert-valued Gaussian noise and Gaussian white noise errors. The rates are controlled by the spectrum of the covariance operator of the measure […]

Ver mais

Like 0

Liked Liked

technocracy

A Bayesian approach to learning mixtures of nonparametric components

digitado ⋅ 6 de March de 2026

arXiv:2512.12988v2 Announce Type: replace-cross Abstract: Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling assumes that the mixture component takes a parametric kernel form. In many applications, making parametric assumptions on the latent subpopulation distributions may be unrealistic, which motivates the need for nonparametric modeling of the mixture components themselves. In this paper, we study finite mixtures with nonparametric mixture components, using a Bayesian nonparametric modeling approach. In particular, it is […]

Ver mais

Like 0

Liked Liked

technocracy

Non-Asymptotic Analysis of Efficiency in Conformalized Regression

digitado ⋅ 6 de March de 2026

arXiv:2510.07093v3 Announce Type: replace-cross Abstract: Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression commonly treats the miscoverage level $alpha$ as a fixed constant. In this work, we establish non-asymptotic bounds on the deviation of the prediction set length from the oracle interval length for conformalized quantile and median regression trained via […]

Ver mais

Like 0

Liked Liked

technocracy

Bures-Wasserstein Flow Matching for Graph Generation

digitado ⋅ 6 de March de 2026

arXiv:2506.14020v4 Announce Type: replace-cross Abstract: Graph generation has emerged as a critical task in fields ranging from drug discovery to circuit design. Contemporary approaches, notably diffusion and flow-based models, have achieved solid graph generative performance through constructing a probability path that interpolates between reference and data distributions. However, these methods typically model the evolution of individual nodes and edges independently and use linear interpolations in the disjoint space of nodes/edges to build the path. This disentangled interpolation breaks […]

Ver mais

Like 0

Liked Liked

technocracy

Enabling stratified sampling in high dimensions via nonlinear dimensionality reduction

digitado ⋅ 6 de March de 2026

arXiv:2506.08921v2 Announce Type: replace-cross Abstract: We consider the problem of propagating the uncertainty from a possibly large number of random inputs through a computationally expensive model. Stratified sampling is a well-known variance reduction strategy, but its application, thus far, has focused on models with a limited number of inputs due to the challenges of creating uniform partitions in high dimensions. To overcome these challenges, we propose a simple methodology for constructing an effective stratification of the input domain […]

Ver mais

Like 0

Liked Liked

technocracy

Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

digitado ⋅ 6 de March de 2026

arXiv:2505.13770v2 Announce Type: replace-cross Abstract: Reliable causal inference is essential for making decisions in high-stakes areas like medicine, economics, and public policy. However, it remains unclear whether large language models (LLMs) can handle rigorous and trustworthy statistical causal inference. Current benchmarks usually involve simplified tasks. For example, these tasks might only ask LLMs to identify semantic causal relationships or draw conclusions directly from raw data. As a result, models may overlook important statistical pitfalls, such as Simpson’s paradox […]

Ver mais

Like 0

Liked Liked

technocracy

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

digitado ⋅ 6 de March de 2026

arXiv:2502.11682v2 Announce Type: replace-cross Abstract: Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL). However, existing algorithms do not achieve both properties at once: they either have optimal DP guarantees but rely on restrictive assumptions such as bounded gradients/bounded data heterogeneity, or they ensure strong optimization performance but lack DP guarantees. To address this gap in the literature, we propose and analyze a new method called Clip21-SGD2M based on […]

Ver mais

Like 0

Liked Liked

technocracy

Curse of Dimensionality in Neural Network Optimization

digitado ⋅ 6 de March de 2026

arXiv:2502.05360v3 Announce Type: replace-cross Abstract: This paper demonstrates that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-frac{4r}{d-2r}}$, where $t$ denotes the time parameter of the gradient flow dynamics. This result highlights the presence of the curse of dimensionality in the optimization computation […]

Ver mais

Like 0

Liked Liked

technocracy

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

digitado ⋅ 6 de March de 2026

arXiv:2412.20298v2 Announce Type: replace-cross Abstract: The digitalization of credit scoring has become essential for financial institutions and commercial banks, especially in the era of digital transformation. Machine learning techniques are commonly used to evaluate customers’ creditworthiness. However, the predicted outcomes of machine learning models can be biased toward protected attributes, such as race or gender. Numerous fairness-aware machine learning models and fairness measures have been proposed. Nevertheless, their performance in the context of credit scoring has not been […]

Ver mais

Like 0

Liked Liked