February 2026

Not All Negative Samples Are Equal: LLMs Learn Better from Plausible Reasoning

digitado ⋅ 3 de February de 2026

Learning from negative samples holds great promise for improving Large Language Model (LLM) reasoning capability, yet existing methods treat all incorrect responses as equally informative, overlooking the crucial role of sample quality. To address this, we propose Plausible Negative Samples (PNS), a method that synthesizes high-quality negative samples exhibiting expected format and structural coherence while ultimately yielding incorrect answers. PNS trains a dedicated model via reverse reinforcement learning (RL) guided by a composite reward combining format compliance, accuracy […]

Ver mais

Like 0

Liked Liked

technocracy

A Function-Space Stability Boundary for Generalization in Interpolating Learning Systems

digitado ⋅ 3 de February de 2026

Modern learning systems often interpolate training data while still generalizing well, yet it remains unclear when algorithmic stability explains this behavior. We model training as a function-space trajectory and measure sensitivity to single-sample perturbations along this trajectory. We propose a contractive propagation condition and a stability certificate obtained by unrolling the resulting recursion. A small certificate implies stability-based generalization, while we also prove that there exist interpolating regimes with small risk where such contractive sensitivity cannot hold, showing […]

Ver mais

Like 0

Liked Liked

technocracy

A Sanity Check on the Moltbook Hype.

digitado ⋅ 3 de February de 2026

How ReAct loops and meta-prompting cause AI agents to drift from “assistants” to role-players. Image created by Gemini A new “social network for AI agents” has gone viral, and the headlines are doing what headlines always do: inviting the readers onto the sensationalism roller-coaster. Here’s one from Forbes: AI Agents Created Their Own Religion, Crustafarianism, On An Agent-Only Social Network. Here’s one from New York Post: Moltbook is a new social media platform exclusively for AI — and some agents are plotting […]

Ver mais

Like 0

Liked Liked

technocracy

DeepDFA: Injecting Temporal Logic in Deep Learning for Sequential Subsymbolic Applications

digitado ⋅ 3 de February de 2026

Integrating logical knowledge into deep neural network training is still a hard challenge, especially for sequential or temporally extended domains involving subsymbolic observations. To address this problem, we propose DeepDFA, a neurosymbolic framework that integrates high-level temporal logic – expressed as Deterministic Finite Automata (DFA) or Moore Machines – into neural architectures. DeepDFA models temporal rules as continuous, differentiable layers, enabling symbolic knowledge injection into subsymbolic domains. We demonstrate how DeepDFA can be used in two key settings: […]

Ver mais

Like 0

Liked Liked

technocracy

Polish serenity

digitado ⋅ 3 de February de 2026

Yesterday I ran across the following mashup by Amy Swearer of a Polish proverb and the Serenity Prayer. Lord, grant me the serenity to accept when it’s no longer my circus, the courage to control the monkeys that are still mine, and the wisdom to know the difference. The proverb is “Nie mój cyrk, nie moje małpy,” literally “Not my circus, not my monkeys”. The post Polish serenity first appeared on John D. Cook.

Ver mais

Like 0

Liked Liked

technocracy

Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts

digitado ⋅ 3 de February de 2026

Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effectively learn both discriminative and comprehensive feature representations while maintaining stability and plasticity over very long task sequences remains an open problem. We propose CaRE, a scalable {C}ontinual Le{a}rner with efficient Bi-Level {R}outing Mixture-of-{E}xperts (BR-MoE). The core idea of BR-MoE is a bi-level routing mechanism: a router selection stage that dynamically activates relevant […]

Ver mais

Like 0

Liked Liked

technocracy

IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning

digitado ⋅ 3 de February de 2026

Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge by autonomously retrieving and synthesizing evidence from large web corpora into long-form reports, enabling a long-horizon agentic paradigm. However, unlike real-time conversational assistants, DR is computationally expensive and time-consuming, creating an autonomy-interaction dilemma: high autonomy on ambiguous user queries often leads to prolonged execution with unsatisfactory outcomes. To address this, we propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting […]

Ver mais

Like 0

Liked Liked

technocracy

Soft-Radial Projection for Constrained End-to-End Learning

digitado ⋅ 3 de February de 2026

Integrating hard constraints into deep learning is essential for safety-critical systems. Yet existing constructive layers that project predictions onto constraint boundaries face a fundamental bottleneck: gradient saturation. By collapsing exterior points onto lower-dimensional surfaces, standard orthogonal projections induce rank-deficient Jacobians, which nullify gradients orthogonal to active constraints and hinder optimization. We introduce Soft-Radial Projection, a differentiable reparameterization layer that circumvents this issue through a radial mapping from Euclidean space into the interior of the feasible set. This construction […]

Ver mais

Like 0

Liked Liked

technocracy

Sony’s patent hints at personalized AI podcasts for PlayStation players

digitado ⋅ 3 de February de 2026

Key Highlights: In the past, we have seen many tech companies file patents to secure their product design, concept, or whatever they may be working on. Speaking of which, Sony has never been shy about filing patents for ideas that may or may not ever see the light of day. Over the last week, rumors around the buttonless PlayStation controller gained a lot of attention among gaming enthusiasts. Now, Sony has added another interesting concept to that long […]

Ver mais

Like 0

Liked Liked

technocracy

CRL-VLA: Continual Vision-Language-Action Learning

digitado ⋅ 3 de February de 2026

Lifelong learning is critical for embodied agents in open-world environments, where reinforcement learning fine-tuning has emerged as an important paradigm to enable Vision-Language-Action (VLA) models to master dexterous manipulation through environmental interaction. Thus, Continual Reinforcement Learning (CRL) is a promising pathway for deploying VLA models in lifelong robotic scenarios, yet balancing stability (retaining old skills) and plasticity (learning new ones) remains a formidable challenge for existing methods. We introduce CRL-VLA, a framework for continual post-training of VLA models […]

Ver mais

Like 0

Liked Liked