digitado

About digitado

https://www.digitado.com.br

Posts by :

Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis – using combination of quality rewards

digitado ⋅ 19 de April de 2026

Training Qwen2.5-0.5B-Instruct on Reddit post summarization with GRPO on my 3x Mac Minis — trying combination of quality rewards with length penalty! So, with this project I want to see if a length constrained (like 64 tokens only) quality summarization can be done by tiny LLMs using GRPO! Why combination of quality rewards? ROUGE-L only cares about the longest common subsequence — it misses synonyms and paraphrases entirely. METEOR handles both: it aligns tokens with synonym matching via […]

Ver mais

Like 0

Liked Liked

technocracy

Rethinking the Comparison Unit in Sequence-Level Reinforcement Learning: An Equal-Length Paired Training Framework from Loss Correction to Sample Construction

digitado ⋅ 19 de April de 2026

This paper investigates the length problem in sequence-level relative reinforcement learning. We observe that, although existing methods partially alleviate length-related phenomena, a more fundamental issue remains insufficiently characterized: the comparison units used during training lack inherent comparability. Building on this observation, we propose a new perspective: the length problem should not be viewed merely as a loss-scaling or normalization bias, but rather as a emph{comparison unit construction} problem. We further establish a sample-construction-based training framework that, instead of […]

Ver mais

Like 0

Liked Liked

technocracy

How Agentic RAG Cut My AI Costs by 66% (And Made It Actually Useful)

digitado ⋅ 19 de April de 2026

From confused hallucinations to accurate answers in one week. The complete guide to building smarter AI — no code required AGENTIC RAG It was demo day. Fifty people in the room. My manager nodding at me from the back row. I typed the question live: “What’s our employee vacation policy?” The AI answered with three paragraphs about GDPR compliance, two sentences about industry salary benchmarks, and — buried in paragraph five — a vague mention that “time-off entitlements may vary.” The actual answer was fifteen days PTO. Seven […]

Ver mais

Like 0

Liked Liked

technocracy

Si tienes garaje y haces menos de 300 kilómetros al día, el eléctrico ya gana. Sin discusión.

digitado ⋅ 19 de April de 2026

Hubo un tiempo en que comprar un coche eléctrico podía considerarse una apuesta, una declaración de intenciones o incluso una extravagancia tecnológica. Ese tiempo ha pasado. En 2025 se vendieron más de 20 millones de coches eléctricos en el mundo, más del 25% de todas las ventas de automóviles nuevos, una cifra que ya no describe una rareza, sino una transición industrial en toda regla. Quien siga hablando del vehículo eléctrico como si fuese una curiosidad está discutiendo […]

Ver mais

Like 0

Liked Liked

technocracy

Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning

digitado ⋅ 19 de April de 2026

Previous LLMs-based RL studies typically follow either supervised learning with high annotation costs, or unsupervised paradigms using voting or entropy-based rewards. However, their performance remains far from satisfactory due to the substantial annotation cost and issues such as model collapse or reward hacking. To address these issues, we introduce a new perspective inspired by cognitive learning theory and propose a novel approach called EasyRL. The core of EasyRL is to simulate the human cognitive acquisition curve by integrating […]

Ver mais

Like 0

Liked Liked

technocracy

A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions

digitado ⋅ 19 de April de 2026

Reinforcement learning (RL) has emerged as a powerful post-training paradigm for enhancing the reasoning capabilities of large language models (LLMs). However, reinforcement learning for LLMs faces substantial data scarcity challenges, including the limited availability of high-quality external supervision and the constrained volume of model-generated experience. These limitations make data-efficient reinforcement learning a critical research direction. In this survey, we present the first systematic review of reinforcement learning for LLMs under data scarcity. We propose a bottom-up hierarchical framework […]

Ver mais

Like 0

Liked Liked

technocracy

Building a Production-Ready RAG System with Incremental Indexing

digitado ⋅ 19 de April de 2026

A comprehensive guide to building a Retrieval-Augmented Generation (RAG) system that efficiently manages document updates, deletions, and additions without re-indexing everything. Source: Gemini Introduction Retrieval-Augmented Generation (RAG) has become the go-to architecture for building AI applications that need to answer questions based on custom knowledge bases. However, most RAG tutorials skip over a critical production concern: how do you efficiently update your knowledge base without re-indexing everything? In this article, I’ll walk you through building a RAG system […]

Ver mais

Like 0

Liked Liked

technocracy

How to Design Offline Eval Gates That Actually Catch Regressions Before Release

digitado ⋅ 19 de April de 2026

A practical guide to implementing offline release gates, with a reference implementation. Article 2 in a series on eval loops for production LLM systems. A release gate is not a benchmark report. It is a decision system. Most teams I’ve seen treat it like a scoreboard instead. They run a dataset, watch one number move, and call that release discipline. The problem shows up later, in production, when a candidate that looked flat on the headline metric turns out to […]

Ver mais

Like 0

Liked Liked

technocracy

The TechBeat: The Trap of “Vibe Coding” and the Rise of Engineering as a Service (4/19/2026)

digitado ⋅ 19 de April de 2026

How are you, hacker? 🪐Want to know what’s trending right now?: The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here. ## 12 OpenCode Skills Every Dev Team Should Steal By @bezgin [ 4 Min read ] A practical guide to coding agent skills and commands from OpenCode and beyond, with reusable patterns for debugging, planning, review, and repo memory. Read More. Pretext Does What CSS […]

Ver mais

Like 0

Liked Liked

technocracy

PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory

digitado ⋅ 19 de April de 2026

We derive explicit non-asymptotic PAC-Bayes generalization bounds for Gibbs posteriors, that is, data-dependent distributions over model parameters obtained by exponentially tilting a prior with the empirical risk. Unlike classical worst-case complexity bounds based on uniform laws of large numbers, which require explicit control of the model space in terms of metric entropy (integrals), our analysis yields posterior-averaged risk bounds that can be applied to overparameterized models and adapt to the data structure and the intrinsic model complexity. The […]

Ver mais

Like 0

Liked Liked