technocracy

Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards

digitado ⋅ 17 de March de 2026

Reinforcement learning with verifiable rewards (RLVR) has driven recent capability advances of large language models across various domains. Recent studies suggest that improved RLVR algorithms allow models to learn effectively from incorrect annotations, achieving performance comparable to learning from clean data. In this work, we show that these findings are invalid because the claimed 100% noisy training data is "contaminated" with clean data. After rectifying the dataset with a rigorous re-verification pipeline, we demonstrate that noise is destructive […]

Ver mais

Like 0

Liked Liked

technocracy

AgenticTyper: Automated Typing of Legacy Software Projects Using Agentic AI

digitado ⋅ 26 de February de 2026

arXiv:2602.21251v1 Announce Type: new Abstract: Legacy JavaScript systems lack type safety, making maintenance risky. While TypeScript can help, manually adding types is expensive. Previous automated typing research focuses on type inference but rarely addresses type checking setup, definition generation, bug identification, or behavioral correctness at repository scale. We present AgenticTyper, a Large Language Model (LLM)-based agentic system that addresses these gaps through iterative error correction and behavior preservation via transpilation comparison. Evaluation on two proprietary repositories (81K LOC) […]

Ver mais

Like 0

Liked Liked

technocracy

Fibonacci numbers and time-space tradeoffs

digitado ⋅ 8 de February de 2026

A few days ago I wrote about Fibonacci numbers and certificates. As I pointed out in the article, there’s no need to certify Fibonacci numbers, but the point of the post was to illustrate the idea of a solution certificate in a simple context. Practical uses of certificates are more complicated. This time I want to use Fibonacci numbers to illustrate space tradeoffs. The post on Fibonacci certificates imagined providing someone a pair (F, r) where F is a large […]

Ver mais

Like 0

Liked Liked

technocracy

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

digitado ⋅ 20 de March de 2026

arXiv:2603.07313v3 Announce Type: replace-cross Abstract: Robustness under latent distribution shift remains challenging in partially observable reinforcement learning. We formalize a focused setting where an adversary selects a hidden initial latent distribution before the episode, termed an adversarial latent-initial-state POMDP. Theoretically, we prove a latent minimax principle, characterize worst-case defender distributions, and derive approximate best-response inequalities with finite-sample concentration bounds that make the optimization and sampling terms explicit. Empirically, using a Battleship benchmark, we demonstrate that targeted exposure to […]

Ver mais

Like 0

Liked Liked

technocracy

APreQEL: Adaptive Mixed Precision Quantization For Edge LLMs

digitado ⋅ 26 de March de 2026

arXiv:2603.23575v1 Announce Type: new Abstract: Today, large language models have demonstrated their strengths in various tasks ranging from reasoning, code generation, and complex problem solving. However, this advancement comes with a high computational cost and memory requirements, making it challenging to deploy these models on edge devices to ensure real-time responses and data privacy. Quantization is one common approach to reducing memory use, but most methods apply it uniformly across all layers. This does not account for the […]

Ver mais

Like 0

Liked Liked

technocracy

Navigating the generative AI journey: The Path-to-Value framework from AWS

digitado ⋅ 14 de April de 2026

Generative AI is reshaping how organizations approach productivity, customer experiences, and operational capabilities. Across industries, teams are experimenting with generative AI to unlock new ways of working. Many of these efforts produce compelling proofs of concept (POC) that demonstrate technical feasibility. The real challenge begins after those early wins. Although POCs frequently demonstrate technical feasibility, organizations often struggle to translate them into production-ready systems that deliver measurable business value. The journey from concept to production, and from production […]

Ver mais

Like 0

Liked Liked

technocracy

Time Series Made So Easy My Aunt Got It on the Second Read

digitado ⋅ 11 de May de 2026

Author(s): Kamrun Nahar Originally published on Towards AI. SARIMAX, Prophet, XGBoost, LSTM, and N-BEATS broken down without any pretentious math. Pick the right model in under five minutes today. The 9 billion dollar lesson. In November 2021, Zillow walked into a conference room and admitted that their AI had set 7,000 houses on fire. Not literally. Financially. They’d built an algorithm to buy and flip homes, and the algorithm spent two years quietly overpaying for everything in Phoenix, […]

Ver mais

Like 0

Liked Liked

technocracy

Digital Guardians: The Past and The Future of Cyber-Physical Resilience

digitado ⋅ 18 de April de 2026

arXiv:2604.14360v1 Announce Type: new Abstract: Resilience in cyber-physical systems (CPS) is the fundamental ability to maintain safety and critical functionality despite adverse “perturbations,” which includes security attacks, environmental disruptions, and hardware or software failures. This survey provides a comprehensive review of CPS resilience, framing the field through five interconnected themes that are required in an integrated whole to achieve real-world resilience. The article first posits that resilience is a system-wide property emerging from interactions between hardware, software, and […]

Ver mais

Like 0

Liked Liked

technocracy

A Coding Guide for Property-Based Testing Using Hypothesis with Stateful, Differential, and Metamorphic Test Design

digitado ⋅ 18 de April de 2026

In this tutorial, we explore property-based testing using Hypothesis and build a rigorous testing pipeline that goes far beyond traditional unit testing. We implement invariants, differential testing, metamorphic testing, targeted exploration, and stateful testing to validate both functional correctness and behavioral guarantees of our systems. Instead of manually crafting edge cases, we let Hypothesis generate structured inputs, shrink failures to minimal counterexamples, and systematically uncover hidden bugs. Also, we demonstrate how modern testing practices can be integrated directly […]

Ver mais

Like 0

Liked Liked

technocracy

ELLA: Efficient Lifelong Learning for Adapters in Large Language Models

digitado ⋅ 5 de January de 2026

Large Language Models (LLMs) suffer severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited: replay-based methods are impractical and privacy-violating, while strict orthogonality-based methods collapse under scale: each new task is projected onto an orthogonal complement, progressively reducing the residual degrees of freedom and eliminating forward transfer by forbidding overlap in shared representations. In this work, we introduce ELLA, a training framework built on the principle of […]

Ver mais

Like 0

Liked Liked