February 2026

Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games

digitado ⋅ 5 de February de 2026

arXiv:2510.13060v2 Announce Type: replace-cross Abstract: Reverse Kullback-Leibler (KL) divergence-based regularization with respect to a fixed reference policy is widely used in modern reinforcement learning to preserve the desired traits of the reference policy and sometimes to promote exploration (using uniform reference policy, known as entropy regularization). Beyond serving as a mere anchor, the reference policy can also be interpreted as encoding prior knowledge about good actions in the environment. In the context of alignment, recent game-theoretic approaches have […]

Ver mais

Like 0

Liked Liked

technocracy

metabeta — A fast neural model for Bayesian mixed-effects regression

digitado ⋅ 5 de February de 2026

arXiv:2510.07473v2 Announce Type: replace-cross Abstract: Hierarchical data with multiple observations per group is ubiquitous in empirical sciences and is often analyzed using mixed-effects regression. In such models, Bayesian inference gives an estimate of uncertainty but is analytically intractable and requires costly approximation using Markov Chain Monte Carlo (MCMC) methods. Neural posterior estimation shifts the bulk of computation from inference time to pre-training time, amortizing over simulated datasets with known ground truth targets. We propose metabeta, a neural network […]

Ver mais

Like 0

Liked Liked

technocracy

When Do Credal Sets Stabilize? Fixed-Point Theorems for Credal Set Updates

digitado ⋅ 5 de February de 2026

arXiv:2510.04769v2 Announce Type: replace-cross Abstract: Many machine learning algorithms rely on iterative updates of uncertainty representations, ranging from variational inference and expectation-maximization, to reinforcement learning, continual learning, and multi-agent learning. In the presence of imprecision and ambiguity, credal sets — closed, convex sets of probability distributions — have emerged as a popular framework for representing imprecise probabilistic beliefs. Under such imprecision, many learning problems in imprecise probabilistic machine learning (IPML) may be viewed as processes involving successive applications […]

Ver mais

Like 0

Liked Liked

technocracy

Domain Generalization Under Posterior Drift

digitado ⋅ 5 de February de 2026

arXiv:2510.04441v2 Announce Type: replace-cross Abstract: Domain generalization (DG) is the problem of generalizing from several distributions (or domains), for which labeled training data are available, to a new test domain for which no labeled data is available. For the prevailing benchmark datasets in DG, there exists a single classifier that performs well across all domains. In this work, we study a fundamentally different regime where the domains satisfy a emph{posterior drift} assumption, in which the optimal classifier might […]

Ver mais

Like 0

Liked Liked

technocracy

Improving Detection of Watermarked Language Models

digitado ⋅ 5 de February de 2026

arXiv:2508.13131v2 Announce Type: replace-cross Abstract: Watermarking has recently emerged as an effective strategy for detecting the generations of large language models (LLMs). The strength of a watermark typically depends strongly on the entropy afforded by the language model and the set of input prompts. However, entropy can be quite limited in practice, especially for models that are post-trained, for example via instruction tuning or reinforcement learning from human feedback (RLHF), which makes detection based on watermarking alone challenging. […]

Ver mais

Like 0

Liked Liked

technocracy

Online Budget Allocation with Censored Semi-Bandit Feedback

digitado ⋅ 5 de February de 2026

arXiv:2508.05844v2 Announce Type: replace-cross Abstract: We study a stochastic budget-allocation problem over $K$ tasks. At each round $t$, the learner chooses an allocation $X_t in Delta_K$. Task $k$ succeeds with probability $F_k(X_{t,k})$, where $F_1,dots,F_K$ are nondecreasing budget-to-success curves, and upon success yields a random reward with unknown mean $mu_k$. The learner observes which tasks succeed, and observes a task’s reward only upon success (censored semi-bandit feedback). This model captures, for instance, splitting payments across crowdsourcing workers or distributing […]

Ver mais

Like 0

Liked Liked

technocracy

Multiple Choice Learning of Low-Rank Adapters for Language Modeling

digitado ⋅ 5 de February de 2026

arXiv:2507.10419v2 Announce Type: replace-cross Abstract: We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple “futures” may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the Winner-Takes-All loss to efficiently handle ambiguity through Low-Rank Adaptation. We provide a theoretical interpretation of applying MCL to language modeling, assuming […]

Ver mais

Like 0

Liked Liked

technocracy

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

digitado ⋅ 5 de February de 2026

arXiv:2507.06969v4 Announce Type: replace-cross Abstract: Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks — re-identification, attribute inference, and data reconstruction — are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) […]

Ver mais

Like 0

Liked Liked

technocracy

Minimax and Bayes Optimal Best-Arm Identification

digitado ⋅ 5 de February de 2026

arXiv:2506.24007v4 Announce Type: replace-cross Abstract: This study investigates minimax and Bayes optimal strategies for fixed-budget best-arm identification. We consider an adaptive procedure consisting of a sampling phase followed by a recommendation phase, and we design an adaptive experiment within this framework to efficiently identify the best arm, defined as the one with the highest expected outcome. In our proposed strategy, the sampling phase consists of two stages. The first stage is a pilot phase, in which we allocate […]

Ver mais

Like 0

Liked Liked

technocracy

LIT-LVM: Structured Regularization for Interaction Terms in Linear Predictors using Latent Variable Models

digitado ⋅ 5 de February de 2026

arXiv:2506.15492v2 Announce Type: replace-cross Abstract: Some of the simplest, yet most frequently used predictors in statistics and machine learning use weighted linear combinations of features. Such linear predictors can model non-linear relationships between features by adding interaction terms corresponding to the products of all pairs of features. We consider the problem of accurately estimating coefficients for interaction terms in linear predictors. We hypothesize that the coefficients for different interaction terms have an approximate low-dimensional structure and represent each […]

Ver mais

Like 0

Liked Liked