February 2026

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

digitado ⋅ 24 de February de 2026

Reinforcement Learning with Verifiable Rewards (RLVR) has become the leading paradigm for enhancing reasoning in Large Language Models (LLMs). However, standard RLVR algorithms suffer from a well-documented pathology: while they improve Pass@1 accuracy through sharpened sampling, they simultaneously narrow the model’s reasoning boundary and reduce generation diversity. We identify a root cause that existing methods overlook: the uniform penalization of errors. Current approaches — whether data-filtering methods that select prompts by difficulty, or advantage normalization schemes — treat […]

Ver mais

Like 0

Liked Liked

technocracy

FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning

digitado ⋅ 24 de February de 2026

Federated Learning (FL) enables collaborative model training across multiple clients without sharing their private data. However, data heterogeneity across clients leads to client drift, which degrades the overall generalization performance of the model. This effect is further compounded by overemphasis on poorly performing clients. To address this problem, we propose FedVG, a novel gradient-based federated aggregation framework that leverages a global validation set to guide the optimization process. Such a global validation set can be established using readily […]

Ver mais

Like 0

Liked Liked

technocracy

FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning

digitado ⋅ 24 de February de 2026

Ver mais

Like 0

Liked Liked

technocracy

Your CPU Is 1,000x Less Efficient Than Physics Allows

digitado ⋅ 24 de February de 2026

How information theory connects to the Second Law of Thermodynamics, and why it may define the ultimate ceiling on AI computation Made using NotebookLM There is a law more fundamental than Moore’s that governs every computer ever built, and it has nothing to do with transistors. It was written not by engineers but by entropy, and it applies equally to a 1950s vacuum tube and a 2024 NVIDIA H100. The law states, essentially, that forgetting is expensive. Not […]

Ver mais

Like 0

Liked Liked

technocracy

Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

digitado ⋅ 24 de February de 2026

Large language models (LLMs) are becoming the foundation for autonomous agents that can use tools to solve complex tasks. Reinforcement learning (RL) has emerged as a common approach for injecting such agentic capabilities, but typically under tightly controlled training setups. It often depends on carefully constructed task-solution pairs and substantial human supervision, which creates a fundamental obstacle to open-ended self-evolution toward superintelligent systems. In this paper, we propose Tool-R0 framework for training general purpose tool-calling agents from scratch […]

Ver mais

Like 0

Liked Liked

technocracy

Squint: Fast Visual Reinforcement Learning for Sim-to-Real Robotics

digitado ⋅ 24 de February de 2026

Visual reinforcement learning is appealing for robotics but expensive — off-policy methods are sample-efficient yet slow; on-policy methods parallelize well but waste samples. Recent work has shown that off-policy methods can train faster than on-policy methods in wall-clock time for state-based control. Extending this to vision remains challenging, where high-dimensional input images complicate training dynamics and introduce substantial storage and encoding overhead. To address these challenges, we introduce Squint, a visual Soft Actor Critic method that achieves faster […]

Ver mais

Like 0

Liked Liked

technocracy

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

digitado ⋅ 24 de February de 2026

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and textit{reflection-on-action}, which uses test-time training to update both […]

Ver mais

Like 0

Liked Liked

technocracy

Statistical Query Lower Bounds for Smoothed Agnostic Learning

digitado ⋅ 24 de February de 2026

We study the complexity of smoothed agnostic learning, recently introduced by~cite{CKKMS24}, in which the learner competes with the best classifier in a target class under slight Gaussian perturbations of the inputs. Specifically, we focus on the prototypical task of agnostically learning halfspaces under subgaussian distributions in the smoothed model. The best known upper bound for this problem relies on $L_1$-polynomial regression and has complexity $d^{tilde{O}(1/σ^2) log(1/ε)}$, where $σ$ is the smoothing parameter and $ε$ is the excess error. […]

Ver mais

Like 0

Liked Liked

technocracy

Build an intelligent photo search using Amazon Rekognition, Amazon Neptune, and Amazon Bedrock

digitado ⋅ 24 de February de 2026

Managing large photo collections presents significant challenges for organizations and individuals. Traditional approaches rely on manual tagging, basic metadata, and folder-based organization, which can become impractical when dealing with thousands of images containing multiple people and complex relationships. Intelligent photo search systems address these challenges by combining computer vision, graph databases, and natural language processing to transform how we discover and organize visual content. These systems capture not just who and what appears in photos, but the complex […]

Ver mais

Like 0

Liked Liked

technocracy

ProxyFL: A Proxy-Guided Framework for Federated Semi-Supervised Learning

digitado ⋅ 24 de February de 2026

Federated Semi-Supervised Learning (FSSL) aims to collaboratively train a global model across clients by leveraging partially-annotated local data in a privacy-preserving manner. In FSSL, data heterogeneity is a challenging issue, which exists both across clients and within clients. External heterogeneity refers to the data distribution discrepancy across different clients, while internal heterogeneity represents the mismatch between labeled and unlabeled data within clients. Most FSSL methods typically design fixed or dynamic parameter aggregation strategies to collect client knowledge on […]

Ver mais

Like 0

Liked Liked