February 2026

Transfer Learning in Infinite Width Feature Learning Networks

digitado ⋅ 25 de February de 2026

arXiv:2507.04448v2 Announce Type: replace-cross Abstract: We develop a theory of transfer learning in infinitely wide neural networks under gradient flow that quantifies when pretraining on a source task improves generalization on a target task. We analyze both (i) fine-tuning, when the downstream predictor is trained on top of source-induced features and (ii) a jointly rich setting, where both pretraining and downstream tasks can operate in a feature learning regime, but the downstream model is initialized with the features […]

Ver mais

Like 0

Liked Liked

technocracy

Regularity and Stability Properties of Selective SSMs with Discontinuous Gating

digitado ⋅ 25 de February de 2026

arXiv:2505.11602v2 Announce Type: replace-cross Abstract: Deep selective State-Space Models (SSMs), whose state-space parameters are modulated online by a selection signal, offer significant expressive power but pose challenges for stability analysis, especially under discontinuous gating. We study continuous-time selective SSMs through the lenses of passivity and Input-to-State Stability (ISS), explicitly distinguishing the selection schedule $x(cdot)$ from the driving (port) input $u(cdot)$. First, we show that state-strict dissipativity ($beta>0$) together with quadratic bounds on a storage functional implies exponential decay […]

Ver mais

Like 0

Liked Liked

technocracy

CONTINA: Confidence Interval for Traffic Demand Prediction with Coverage Guarantee

digitado ⋅ 25 de February de 2026

arXiv:2504.13961v2 Announce Type: replace-cross Abstract: Accurate short-term traffic demand prediction is critical for the operation of traffic systems. Besides point estimation, the confidence interval of the prediction is also of great importance. Many models for traffic operations, such as shared bike rebalancing and taxi dispatching, take into account the uncertainty of future demand and require confidence intervals as the input. However, existing methods for confidence interval modeling rely on strict assumptions, such as unchanging traffic patterns and correct […]

Ver mais

Like 0

Liked Liked

technocracy

Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

digitado ⋅ 25 de February de 2026

arXiv:2503.00229v4 Announce Type: replace-cross Abstract: Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant L and adapts to the “local” smoothness, enabling GD to converge faster. Existing theoretical analyses show that GD with Armijo-LS (GD-LS) can result in constant factor improvements over GD with a 1/L step-size (denoted as GD(1/L)). We strengthen these results and show that if the […]

Ver mais

Like 0

Liked Liked

technocracy

Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm

digitado ⋅ 25 de February de 2026

arXiv:2507.05306v3 Announce Type: replace Abstract: We consider the multinomial logistic bandit problem in which a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on understanding the impact of the non-linearity of the logistic model (Faury et al., 2020; Abeille et al., 2021). They introduced a problem-dependent constant $kappa_* geq 1$ that may be exponentially large in some problem […]

Ver mais

Like 0

Liked Liked

technocracy

A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

digitado ⋅ 25 de February de 2026

arXiv:2505.22554v5 Announce Type: replace Abstract: Effective feature selection is critical for robust and interpretable predictive modeling in medicine, especially when risk factors matter most in extreme patient strata. Many standard selectors emphasize average associations and can miss predictors whose relevance is concentrated in the distribution tails. We propose a computationally efficient supervised filter based on a Gumbel-copula implied upper-tail concordance score (lambda U), defined as a monotone transformation of Kendall’s tau, to rank features by their tendency to […]

Ver mais

Like 0

Liked Liked

technocracy

Tightening Optimality gap with confidence through conformal prediction

digitado ⋅ 25 de February de 2026

arXiv:2503.04071v4 Announce Type: replace Abstract: Decision makers routinely use constrained optimization technology to plan and operate complex systems like global supply chains or power grids. In this context, practitioners must assess how close a computed solution is to optimality in order to make operational decisions, such as whether the current solution is sufficient or whether additional computation is warranted. A common practice is to evaluate solution quality using dual bounds returned by optimization solvers. While these dual bounds […]

Ver mais

Like 0

Liked Liked

technocracy

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

digitado ⋅ 25 de February de 2026

arXiv:2410.16106v5 Announce Type: replace Abstract: We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the optimal linear approximation to the value function. Assuming independent samples, we make three theoretical contributions that improve upon the current state-of-the-art results: (i) we establish refined high-dimensional Berry-Esseen bounds over the class of convex sets, achieving faster rates than the […]

Ver mais

Like 0

Liked Liked

technocracy

Statistical Query Lower Bounds for Smoothed Agnostic Learning

digitado ⋅ 25 de February de 2026

arXiv:2602.21191v1 Announce Type: cross Abstract: We study the complexity of smoothed agnostic learning, recently introduced by~cite{CKKMS24}, in which the learner competes with the best classifier in a target class under slight Gaussian perturbations of the inputs. Specifically, we focus on the prototypical task of agnostically learning halfspaces under subgaussian distributions in the smoothed model. The best known upper bound for this problem relies on $L_1$-polynomial regression and has complexity $d^{tilde{O}(1/sigma^2) log(1/epsilon)}$, where $sigma$ is the smoothing parameter and […]

Ver mais

Like 0

Liked Liked

technocracy

SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

digitado ⋅ 25 de February de 2026

arXiv:2602.21133v1 Announce Type: cross Abstract: Vector-quantized representations enable powerful discrete generative models but lack semantic structure in token space, limiting interpretable human control. We introduce SOM-VQ, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks with explicit low-dimensional topology. Unlike standard VQ-VAE, SOM-VQ uses topology-aware updates that preserve neighborhood structure: nearby tokens on a learned grid correspond to semantically similar states, enabling direct geometric manipulation of the latent space. We demonstrate that SOM-VQ […]

Ver mais

Like 0

Liked Liked