January 2026

How Memory in Optimization Algorithms Implicitly Modifies the Loss

digitado ⋅ 14 de January de 2026

arXiv:2502.02132v3 Announce Type: replace-cross Abstract: In modern optimization methods used in deep learning, each update depends on the history of previous iterations, often referred to as memory, and this dependence decays fast as the iterates go further into the past. For example, gradient descent with momentum has exponentially decaying memory through exponentially averaged past gradients. We introduce a general technique for identifying a memoryless algorithm that approximates an optimization algorithm with memory. It is obtained by replacing all […]

Ver mais

Like 0

Liked Liked

technocracy

Applying the maximum entropy principle to neural networks enhances multi-species distribution models

digitado ⋅ 14 de January de 2026

arXiv:2412.19217v4 Announce Type: replace-cross Abstract: The rapid expansion of citizen science initiatives has led to a significant growth of biodiversity databases, and particularly presence-only (PO) observations. PO data are invaluable for understanding species distributions and their dynamics, but their use in a Species Distribution Model (SDM) is curtailed by sampling biases and the lack of information on absences. Poisson point processes are widely used for SDMs, with Maxent being one of the most popular methods. Maxent maximises the […]

Ver mais

Like 0

Liked Liked

technocracy

Gradient flow in parameter space is equivalent to linear interpolation in output space

digitado ⋅ 14 de January de 2026

arXiv:2408.01517v3 Announce Type: replace-cross Abstract: We prove that the standard gradient flow in parameter space that underlies many training algorithms in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, for the $L^{2}$ loss, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply […]

Ver mais

Like 0

Liked Liked

technocracy

The radius of statistical efficiency

digitado ⋅ 14 de January de 2026

arXiv:2405.09676v2 Announce Type: replace-cross Abstract: Classical results in asymptotic statistics show that the Fisher information matrix controls the difficulty of estimating a statistical model from observed data. In this work, we introduce a companion measure of robustness of an estimation problem: the radius of statistical efficiency (RSE) is the size of the smallest perturbation to the problem data that renders the Fisher information matrix singular. We compute RSE up to numerical constants for a variety of testbed problems, […]

Ver mais

Like 0

Liked Liked

technocracy

Statistical learning on measures: an application to persistence diagrams

digitado ⋅ 14 de January de 2026

arXiv:2303.08456v3 Announce Type: replace-cross Abstract: We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $mathcal{X}$. Formally, we observe data $D_N = (mu_1, Y_1), ldots, (mu_N, Y_N)$ where $mu_i$ is a measure on $mathcal{X}$ and $Y_i$ is a label in ${0, 1}$. Given a set $mathcal{F}$ of base-classifiers on $mathcal{X}$, we build corresponding classifiers in the space of measures. We provide upper and […]

Ver mais

Like 0

Liked Liked

technocracy

Cross-Domain Imitation Learning via Optimal Transport

digitado ⋅ 14 de January de 2026

arXiv:2110.03684v4 Announce Type: replace-cross Abstract: Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain imitation that uses the Gromov-Wasserstein distance to align and compare states between the […]

Ver mais

Like 0

Liked Liked

technocracy

$phi$-test: Global Feature Selection and Inference for Shapley Additive Explanations

digitado ⋅ 14 de January de 2026

arXiv:2512.07578v2 Announce Type: replace Abstract: We propose $phi$-test, a global feature-selection and significance procedure for black-box predictors that combines Shapley attributions with selective inference. Given a trained model and an evaluation dataset, $phi$-test performs SHAP-guided screening and fits a linear surrogate on the screened features via a selection rule with a tractable selective-inference form. For each retained feature, it outputs a Shapley-based global score, a surrogate coefficient, and post-selection $p$-values and confidence intervals in a global feature-importance table. […]

Ver mais

Like 0

Liked Liked

technocracy

Transfer Learning Across Fixed-Income Product Classes

digitado ⋅ 14 de January de 2026

arXiv:2505.07676v2 Announce Type: replace Abstract: We propose a framework for transfer learning of discount curves across different fixed-income product classes. Motivated by challenges in estimating discount curves from sparse or noisy data, we extend kernel ridge regression (KR) to a vector-valued setting, formulating a convex optimization problem in a vector-valued reproducing kernel Hilbert space (RKHS). Each component of the solution corresponds to the discount curve implied by a specific product class. We introduce an additional regularization term motivated […]

Ver mais

Like 0

Liked Liked

technocracy

Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning

digitado ⋅ 14 de January de 2026

arXiv:2412.07057v3 Announce Type: replace Abstract: Imitation learning (IL) is a paradigm for learning sequential decision making policies from experts, leveraging offline demonstrations, interactive annotations, or both. Recent advances show that when annotation cost is tallied per trajectory, Behavior Cloning (BC) which relies solely on offline demonstrations cannot be improved in general, leaving limited conditions for interactive methods such as DAgger to help. We revisit this conclusion and prove that when the annotation cost is measured per state, algorithms […]

Ver mais

Like 0

Liked Liked

technocracy

Evaluating the Ability of Explanations to Disambiguate Models in a Rashomon Set

digitado ⋅ 14 de January de 2026

arXiv:2601.08703v1 Announce Type: cross Abstract: Explainable artificial intelligence (XAI) is concerned with producing explanations indicating the inner workings of models. For a Rashomon set of similarly performing models, explanations provide a way of disambiguating the behavior of individual models, helping select models for deployment. However explanations themselves can vary depending on the explainer used, and need to be evaluated. In the paper “Evaluating Model Explanations without Ground Truth”, we proposed three principles of explanation evaluation and a new […]

Ver mais

Like 0

Liked Liked