digitado – Page 550

Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models

digitado ⋅ 12 de January de 2026

Large language models (LLMs) achieve strong aver- age performance yet remain unreliable at the instance level, with frequent hallucinations, brittle failures, and poorly calibrated confidence. We study reliability through the lens of multi-model consensus: given responses from several heterogeneous LLMs, can we learn which answer is most likely correct for a given query? We introduce a Multi-Model Consensus Reasoning Engine that treats the set of LLM outputs as input to a supervised meta-learner. The system maps natural language […]

Ver mais

Like 0

Liked Liked

technocracy

Opportunistic Scheduling for Optimal Spot Instance Savings in the Cloud

digitado ⋅ 21 de January de 2026

arXiv:2601.12266v1 Announce Type: new Abstract: We study the problem of scheduling delay-sensitive jobs over spot and on-demand cloud instances to minimize average cost while meeting an average delay constraint. Jobs arrive as a general stochastic process, and incur different costs based on the instance type. This work provides the first analytical treatment of this problem using tools from queuing theory, stochastic processes, and optimization. We derive cost expressions for general policies, prove queue length one is optimal for […]

Ver mais

Like 0

Liked Liked

technocracy

Factorizable joint shift revisited

digitado ⋅ 30 de January de 2026

arXiv:2601.15036v2 Announce Type: replace-cross Abstract: Factorizable joint shift (FJS) represents a type of distribution shift (or dataset shift) that comprises both covariate and label shift. Recently, it has been observed that FJS actually arises from consecutive label and covariate (or vice versa) shifts. Research into FJS so far has been confined to the case of categorical labels. We propose a framework for analysing distribution shift in the case of a general label space, thus covering both classification and […]

Ver mais

Like 0

Liked Liked

technocracy

Quoting Thoughtworks

digitado ⋅ 14 de February de 2026

The retreat challenged the narrative that AI eliminates the need for junior developers. Juniors are more profitable than they have ever been. AI tools get them past the awkward initial net-negative phase faster. They serve as a call option on future productivity. And they are better at AI tools than senior engineers, having never developed the habits and assumptions that slow adoption. The real concern is mid-level engineers who came up during the decade-long hiring boom and may […]

Ver mais

Like 0

Liked Liked

technocracy

The Magic Correlations: Understanding Knowledge Transfer from Pretraining to Supervised Fine-Tuning

digitado ⋅ 13 de February de 2026

arXiv:2602.11217v1 Announce Type: new Abstract: Understanding how language model capabilities transfer from pretraining to supervised fine-tuning (SFT) is fundamental to efficient model development and data curation. In this work, we investigate four core questions: RQ1. To what extent do accuracy and confidence rankings established during pretraining persist after SFT? RQ2. Which benchmarks serve as robust cross-stage predictors and which are unreliable? RQ3. How do transfer dynamics shift with model scale? RQ4. How well does model confidence align with […]

Ver mais

Like 0

Liked Liked

technocracy

Boltzmann Generators for Condensed Matter via Riemannian Flow Matching

digitado ⋅ 31 de March de 2026

arXiv:2602.18482v2 Announce Type: replace-cross Abstract: Sampling equilibrium distributions is fundamental to statistical mechanics. While flow matching has emerged as scalable state-of-the-art paradigm for generative modeling, its potential for equilibrium sampling in condensed-phase systems remains largely unexplored. We address this by incorporating the periodicity inherent to these systems into continuous normalizing flows using Riemannian flow matching. The high computational cost of exact density estimation intrinsic to continuous normalizing flows is mitigated by using Hutchinson’s trace estimator, utilizing a crucial […]

Ver mais

Like 0

Liked Liked

technocracy

Learning the S-matrix from data: Rediscovering gravity from gauge theory via symbolic regression

digitado ⋅ 16 de February de 2026

We demonstrate that modern machine-learning methods can autonomously reconstruct several flagship analytic structures in scattering amplitudes directly from numerical on-shell data. In particular, we show that the Kawai–Lewellen–Tye (KLT) relations can be rediscovered using symbolic regression applied to colour-ordered Yang–Mills amplitudes with Mandelstam invariants as input features. Using standard feature-selection techniques, specifically column-pivoted QR factorisation, we simultaneously recover the Kleiss–Kuijf and Bern–Carrasco–Johansson (BCJ) relations, identifying a minimal basis of partial amplitudes without any group-theoretic input. We obtain the […]

Ver mais

Like 0

Liked Liked

technocracy

LLMOps Guide: The End-to-End Pipeline for Reliable AI Applications

digitado ⋅ 10 de March de 2026

Author(s): Divy Yadav Originally published on Towards AI. For developers who have just built an LLM, RAG, or agentic system and are wondering what comes next. Most teams celebrate when their AI application finally works. The demo looks good, the feature ships. Photo by authorThis article discusses the challenges teams face when transitioning from an AI application that merely works to a robust production system that remains reliable and performant. It emphasizes the importance of LLMOps—Large Language Model […]

Ver mais

Like 0

Liked Liked

technocracy

Few-for-Many Personalized Federated Learning

digitado ⋅ 12 de March de 2026

Personalized Federated Learning (PFL) aims to train customized models for clients with highly heterogeneous data distributions while preserving data privacy. Existing approaches often rely on heuristics like clustering or model interpolation, which lack principled mechanisms for balancing heterogeneous client objectives. Serving $M$ clients with distinct data distributions is inherently a multi-objective optimization problem, where achieving optimal personalization ideally requires $M$ distinct models on the Pareto front. However, maintaining $M$ separate models poses significant scalability challenges in federated settings […]

Ver mais

Like 0

Liked Liked

technocracy

Glitter: Visualizing Lexical Surprisal for Readability in Administrative Texts

digitado ⋅ 12 de January de 2026

arXiv:2601.05411v1 Announce Type: new Abstract: This work investigates how measuring information entropy of text can be used to estimate its readability. We propose a visualization framework that can be used to approximate information entropy of text using multiple language models and visualize the result. The end goal is to use this method to estimate and improve readability and clarity of administrative or bureaucratic texts. Our toolset is available as a libre software on https://github.com/ufal/Glitter.

Ver mais

Like 0

Liked Liked