digitado

About digitado

https://www.digitado.com.br

Posts by :

Efficient Autoregressive Inference for Transformer Probabilistic Models

digitado ⋅ 22 de April de 2026

arXiv:2510.09477v2 Announce Type: replace Abstract: Set-based transformer models for amortized probabilistic inference and meta-learning, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many applications require joint distributions over multiple predictions. Purely autoregressive architectures generate these efficiently but sacrifice flexible set-conditioning. Obtaining joint distributions from set-based models requires re-encoding the entire context at each autoregressive step, which scales poorly. We introduce a causal autoregressive buffer that combines the strengths of both […]

Ver mais

Like 0

Liked Liked

technocracy

When Langevin Monte Carlo Meets Randomization: Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness

digitado ⋅ 22 de April de 2026

arXiv:2509.25630v2 Announce Type: replace Abstract: Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we revisit the randomized Langevin Monte Carlo (RLMC) for sampling from high dimensional distributions without log-concavity. Under the gradient Lipschitz condition and the log-Sobolev inequality, we prove a uniform-in-time error bound in $mathcal{W}_2$-distance of order $O(sqrt{d}h)$ for the RLMC sampling algorithm, which matches […]

Ver mais

Like 0

Liked Liked

technocracy

Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

digitado ⋅ 22 de April de 2026

arXiv:2509.03726v2 Announce Type: replace Abstract: Sampling from unnormalized target distributions, e.g. Boltzmann distributions $mu_{text{target}}(x) propto exp(-E(x)/T)$, is fundamental to many scientific applications yet computationally challenging due to complex, high-dimensional energy landscapes. Existing approaches applying modern generative models to Boltzmann distributions either require large datasets of samples drawn from the target distribution or, when using only energy evaluations for training, cannot efficiently leverage the expressivity of advanced architectures like continuous normalizing flows that have shown promise for molecular sampling. […]

Ver mais

Like 0

Liked Liked

technocracy

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

digitado ⋅ 22 de April de 2026

arXiv:2505.22811v5 Announce Type: replace Abstract: Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need […]

Ver mais

Like 0

Liked Liked

technocracy

LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data

digitado ⋅ 22 de April de 2026

arXiv:2505.09803v3 Announce Type: replace Abstract: In many applications, we wish to fit a parametric statistical model to a small ensemble of spatially distributed random variables (‘fields’). However, parameter inference using maximum likelihood estimation (MLE) is computationally prohibitive, especially for large, non-stationary fields. Thus, many recent works train neural networks to estimate parameters given spatial fields as input, sidestepping MLE completely. In this work we focus on a popular class of parametric, spatially autoregressive (SAR) models. We make a […]

Ver mais

Like 0

Liked Liked

technocracy

Generalization at the Edge of Stability

digitado ⋅ 22 de April de 2026

arXiv:2604.19740v1 Announce Type: cross Abstract: Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the optimization dynamics exhibit oscillatory and chaotic behavior. Empirically, this regime often yields improved generalization performance, yet the underlying mechanism remains poorly understood. In this work, we represent stochastic optimizers as random dynamical systems, which often converge to a fractal attractor set (rather than a point) with a smaller intrinsic dimension. Building on this connection and […]

Ver mais

Like 0

Liked Liked

technocracy

Phase Transitions in the Fluctuations of Functionals of Random Neural Networks

digitado ⋅ 22 de April de 2026

arXiv:2604.19738v1 Announce Type: cross Abstract: We establish central and non-central limit theorems for sequences of functionals of the Gaussian output of an infinitely-wide random neural network on the d-dimensional sphere . We show that the asymptotic behaviour of these functionals as the depth of the network increases depends crucially on the fixed points of the covariance function, resulting in three distinct limiting regimes: convergence to the same functional of a limiting Gaussian field, convergence to a Gaussian distribution, […]

Ver mais

Like 0

Liked Liked

technocracy

Ultrametric OGP – parametric RDT emph{symmetric} binary perceptron connection

digitado ⋅ 22 de April de 2026

arXiv:2604.19712v1 Announce Type: cross Abstract: In [97,99,100], an fl-RDT framework is introduced to characterize emph{statistical computational gaps} (SCGs). Studying emph{symmetric binary perceptrons} (SBPs), [100] obtained an emph{algorithmic} threshold estimate $alpha_aapprox alpha_c^{(7)}approx 1.6093$ at the 7th lifting level (for $kappa=1$ margin), closely approaching $1.58$ local entropy (LE) prediction [18]. In this paper, we further connect parametric RDT to overlap gap properties (OGPs), another key geometric feature of the solution space. Specifically, for any positive integer $s$, we consider $s$-level […]

Ver mais

Like 0

Liked Liked

technocracy

Budgeted Online Influence Maximization

digitado ⋅ 22 de April de 2026

arXiv:2604.19672v1 Announce Type: cross Abstract: We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set. Our approach better models the real-world setting where the cost of influencers varies and advertisers want to find the best value for their overall social advertising budget. We propose an algorithm assuming an independent cascade diffusion model and edge level semi-bandit feedback, and provide […]

Ver mais

Like 0

Liked Liked

technocracy

Separating Geometry from Probability in the Analysis of Generalization

digitado ⋅ 22 de April de 2026

arXiv:2604.19560v1 Announce Type: cross Abstract: The goal of machine learning is to find models that minimize prediction error on data that has not yet been seen. Its operational paradigm assumes access to a dataset $S$ and articulates a scheme for evaluating how well a given model performs on an arbitrary sample. The sample can be $S$ (in which case we speak of “in-sample” performance) or some entirely new $S’$ (in which case we speak of “out-of-sample” performance). Traditional […]

Ver mais

Like 0

Liked Liked