digitado

About digitado

https://www.digitado.com.br

Posts by :

Task Abstention for Large Language Models in Code Generation

digitado ⋅ 19 de May de 2026

arXiv:2605.17029v1 Announce Type: new Abstract: Large language models (LLMs) have revolutionized automated code generation. One serious concern, however, is the so-called “hallucination”, i.e., LLMs may generate seemingly plausible but functionally incorrect code. In this paper, we study the task abstention problem, i.e., determining whether a given LLM should abstain from performing a specific code generation task to avoid likely hallucination. Our approach features a calibrated abstention rule, grounded in the principles of multiple hypothesis testing. The rule assesses […]

Ver mais

Like 0

Liked Liked

technocracy

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

digitado ⋅ 19 de May de 2026

arXiv:2605.17028v1 Announce Type: new Abstract: Large language models (LLMs) hallucinate with confidence: their outputs can be fluent, authoritative, and simply wrong. In medical, legal, and scientific applications this failure causes direct harm, and detecting it from internal model states offers a path to safer deployment. A growing body of work reports that this problem is increasingly tractable, with recent methods achieving high detection performance on widely used benchmarks. We show, however, that much of this apparent progress does […]

Ver mais

Like 0

Liked Liked

technocracy

Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road

digitado ⋅ 19 de May de 2026

arXiv:2605.17026v1 Announce Type: new Abstract: Recent progress in large language models has led to the emergence of reasoning models, which have shown strong performance on complex tasks through specialized fine-tuning procedures. While these methods reliably improve pass@1 accuracy, prior works have observed that they show a coverage shrinkage behavior, where pass@k degrades relative to the base model. In this paper, we investigate the reasoning shrinkage arise under SFT-based post-training. We hypothesize that this behavior is driven by properties […]

Ver mais

Like 0

Liked Liked

technocracy

Sum of rank ratios: an alternative to percentiles for research assessment, from groundbreaking to mainstream research

digitado ⋅ 19 de May de 2026

arXiv:2605.17023v1 Announce Type: new Abstract: Assessing research that pushes the boundaries of knowledge is challenging because such work is extremely infrequent, accounting for only about 0.01 per cent of all research outputs. Consequently, knowledge about how to evaluate this type of research is far more limited than the well established methods used to assess more common research outcomes. This study addresses this gap by using a rank based approach in which each paper is assigned a unique value […]

Ver mais

Like 0

Liked Liked

technocracy

Intermediate Constacyclic Codes and Scalar-Residue Reed–Muller Layers

digitado ⋅ 19 de May de 2026

arXiv:2605.17022v1 Announce Type: new Abstract: A 2024 paper of Sun, Ding and Wang introduced a second class of constacyclic codes over finite fields, denoted $C(q,m,r,ell)$, with length $(q^m-1)/r$, where $rmid(q-1)$ and the defining monomials have total $q$-ary degree congruent to $r-1$ modulo $r$. In the non-projective intermediate range $2

Ver mais

Like 0

Liked Liked

technocracy

The last six months in LLMs in five minutes

digitado ⋅ 19 de May de 2026

I put together these annotated slides from my five minute lightning talk at PyCon US 2026, using the latest iteration of my annotated presentation tool. # I presented this lightning talk at PyCon US 2026, attempting to summarize the last six months of developments in LLMs in five minutes. # Six months is a pretty convenient time period to cover, because it captures what I’ve been calling the November 2025 inflection point. November was a critical month in […]

Ver mais

Like 0

Liked Liked

technocracy

LLM Observability with Self-Hosted Langfuse and vLLM

digitado ⋅ 19 de May de 2026

Ver mais

Like 0

Liked Liked

technocracy

The MIT-IBM Computing Research Lab launches to shape the future of AI and quantum computing

digitado ⋅ 19 de May de 2026

The following is a joint announcement by the MIT Schwarzman College of Computing and IBM. IBM and MIT today announced the launch of the MIT-IBM Computing Research Lab, advancing their long-standing collaboration to shape the next era of computing. The new lab expands its scope to include quantum computing, alongside foundational artificial intelligence research, with the goal of unlocking new computational approaches that go beyond the limits of today’s classical systems. The MIT-IBM Computing Research Lab builds on […]

Ver mais

Like 0

Liked Liked

technocracy

Meet MemPrivacy: An Edge-Cloud Framework that Uses Local Reversible Pseudonymization to Protect User Data Without Breaking Memory Utility

digitado ⋅ 18 de May de 2026

As LLM-powered agents move from research to production, one design tension is becoming harder to ignore: the more useful cloud-hosted memory becomes, the more private user data it exposes. Researchers from MemTensor (Shanghai), HONOR Device and Tongji University have introduced MemPrivacy, a framework that attempts to resolve this tension without sacrificing the utility that makes personalized memory worthwhile in the first place. The Core Problem With Cloud Memory When you interact with an AI agent, your conversation often […]

Ver mais

Like 0

Liked Liked

technocracy

Ebola outbreak: WHO declares emergency, US restricts travel, American infected

digitado ⋅ 18 de May de 2026

The Ebola outbreak first reported in the Democratic Republic of the Congo on Friday has seemingly escalated quickly into a large, uncontrolled multinational outbreak. As of May 17, there were 10 confirmed cases, 336 suspected cases, and 88 deaths in the DRC, as well as two confirmed cases and one death in neighboring Uganda, according to the latest data from the US Centers for Disease Control and Prevention, which has offices in the region. The numbers already put […]

Ver mais

Like 0

Liked Liked