digitado

Unlocking the Pre-Trained Model as a Dual-Alignment Calibrator for Post-Trained LLMs

digitado ⋅ 9 de January de 2026

arXiv:2601.04277v1 Announce Type: new Abstract: Post-training improves large language models (LLMs) but often worsens confidence calibration, leading to systematic overconfidence. Recent unsupervised post-hoc methods for post-trained LMs (PoLMs) mitigate this by aligning PoLM confidence to that of well-calibrated pre-trained counterparts. However, framing calibration as static output-distribution matching overlooks the inference-time dynamics introduced by post-training. In particular, we show that calibration errors arise from two regimes: (i) confidence drift, where final confidence inflates despite largely consistent intermediate decision processes, […]

Ver mais

Like 0

Liked Liked

technocracy

Explicit Abstention Knobs for Predictable Reliability in Video Question Answering

digitado ⋅ 6 de January de 2026

arXiv:2601.00138v1 Announce Type: new Abstract: High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust under distribution shift. Using NExT-QA and Gemini 2.0 Flash, we establish two findings. First, confidence thresholding provides mechanistic control in-distribution. Sweeping threshold epsilon produces smooth risk-coverage tradeoffs, reducing error rates f

Ver mais

Like 0

Liked Liked

technocracy

Factorized Orthogonal Latent Spaces

digitado ⋅ 31 de March de 2010

Existing approaches to multi-view learning are particularly effective when the views are either independent (i.e, multi-kernel approaches) or fully dependent (i.e., shared latent spaces). However, in real scenarios, these assumptions are almost never truly satisfied. Recently, two methods have attempted to tackle this problem by factorizing the information and learn separate latent spaces for modeling the shared (i.e., correlated) and private (i.e., independent) parts of the data. However, these approaches are very sensitive to parameters setting or initialization. […]

Ver mais

Like 0

Liked Liked

technocracy

Investigating Quasar Data With Polars and Interactive marimo Notebooks

digitado ⋅ 21 de October de 2025

Learn to visualize quasar redshift data by building an interactive marimo dashboard using Polars, pandas, and Matplotlib. You’ll practice retrieving, cleaning, and displaying data in your notebook. You’ll also build interactive UI components that live-update visualizations in the notebook. [ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Ver mais

Like 0

Liked Liked

technocracy

From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning

digitado ⋅ 6 de January de 2026

arXiv:2601.00215v1 Announce Type: new Abstract: Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate reasoning that lacks integration of visual information. This limits their ability to solve problems that demand accurate visual perception, such as visual puzzles. We show that visual perception is the key bottleneck in such tasks: converting images into textual descriptions significantly improves performance, yielding gains of 26.7% for Claude […]

Ver mais

Like 0

Liked Liked

technocracy

Sample Path Regularity of Gaussian Processes from the Covariance Kernel

digitado ⋅ 6 de January de 2026

arXiv:2312.14886v3 Announce Type: replace-cross Abstract: Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of GP sample paths, i.e. the function spaces over which they define a probability measure, is lacking. In practice, GPs are not constructed through a probability measure, but instead through a mean function and a covariance kernel. In this paper we provide necessary and sufficient conditions on the […]

Ver mais

Like 0

Liked Liked

technocracy

Variance-Reduced Diffusion Sampling via Conditional Score Expectation Identity

digitado ⋅ 6 de January de 2026

arXiv:2601.01594v1 Announce Type: new Abstract: We introduce and prove a textbf{Conditional Score Expectation (CSE)} identity: an exact relation for the marginal score of affine diffusion processes that links scores across time via a conditional expectation under the forward dynamics. Motivated by this identity, we propose a CSE-based statistical estimator for the score using a Self-Normalized Importance Sampling (SNIS) procedure with prior samples and forward noise. We analyze its relationship to the standard Tweedie estimator, proving anti-correlation for Gaussian […]

Ver mais

Like 0

Liked Liked

technocracy

Ars readers gave over $42,000 in our 2025 Charity Drive

digitado ⋅ 5 de January de 2026

Last month, we asked readers to donate to a couple of good causes in our 2025 Charity Drive sweepstakes. And boy, did you deliver. With the drive now complete and the donations all tallied, we can report that Ars Technica readers gave an incredible $42,936.83 to Child’s Play and the Electronic Frontier Foundation in this year’s drive. That doesn’t set a new record, but it beats last year’s total and raises our lifetime Ars Charity Drive donation haul […]

Ver mais

Like 0

Liked Liked

technocracy

Haskell in Production: Meta

digitado ⋅ 6 de April de 2023

In our Haskell in Production series, we interview developers and technical leaders from companies that use Haskell for real-world tasks. We cover benefits, downsides, common pitfalls, and tips for building useful Haskell products. This time, we have quite a special guest – Simon Marlow from Meta. He’s one of the co-authors of the Glasgow Haskell Compiler (GHC) and the author of Parallel and Concurrent Programming in Haskell. Currently, he’s working at Meta on Glean, a system for collecting, […]

Ver mais

Like 0

Liked Liked

technocracy

“Dr AI, am I healthy?” 59% of Brits rely on AI for self-diagnosis

digitado ⋅ 8 de January de 2026

AI advancements are changing the way we look at health and deal with health-related issues. According to a new nationwide study by Confused.com Life Insurance, three in five Brits now use AI to self-diagnose health conditions. Through various searches, like side effects of medical conditions, treatment options, and symptom checks, as much as 11% of respondents claim AI has helped improve their conditions. More than a third (35%) are likely to use AI in this context in the […]

Ver mais

Like 0

Liked Liked