February 2026

Reinforcement Learning with Backtracking Feedback

digitado ⋅ 9 de February de 2026

Addressing the critical need for robust safety in Large Language Models (LLMs), particularly against adversarial attacks and in-distribution errors, we introduce Reinforcement Learning with Backtracking Feedback (RLBF). This framework advances upon prior methods, such as BSAFE, by primarily leveraging a Reinforcement Learning (RL) stage where models learn to dynamically correct their own generation errors. Through RL with critic feedback on the model’s live outputs, LLMs are trained to identify and recover from their actual, emergent safety violations by […]

Ver mais

Like 0

Liked Liked

technocracy

Learning Human-Like Badminton Skills for Humanoid Robots

digitado ⋅ 9 de February de 2026

Realizing versatile and human-like performance in high-demand sports like badminton remains a formidable challenge for humanoid robotics. Unlike standard locomotion or static manipulation, this task demands a seamless integration of explosive whole-body coordination and precise, timing-critical interception. While recent advances have achieved lifelike motion mimicry, bridging the gap between kinematic imitation and functional, physics-aware striking without compromising stylistic naturalness is non-trivial. To address this, we propose Imitation-to-Interaction, a progressive reinforcement learning framework designed to evolve a robot from […]

Ver mais

Like 0

Liked Liked

technocracy

Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning

digitado ⋅ 9 de February de 2026

Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform treatment of feature dimensions and the neglect of the intrinsic spectral structure of the learned features. Empirical evidence indicates that high-dimensional embeddings tend to collapse into narrow cones, concentrating task-relevant semantics in a small subspace, while the majority of dimensions remain occupied by noise and spurious correlations. Such spectral imbalance and entanglement undermine model generalization. […]

Ver mais

Like 0

Liked Liked

technocracy

The TechBeat: The SEPA Instant Deadlines Have Passed. But Did Europe Really Go Instant? (2/9/2026)

digitado ⋅ 9 de February de 2026

How are you, hacker? 🪐Want to know what’s trending right now?: The Techbeat by HackerNoon has got you covered with fresh content from our trending stories of the day! Set email preference here. ## Yuri Misnik, CTO at InDrive, on Architecting an AI-First Super App By @newsbyte [ 7 Min read ] Meet Yuri Misnik, Chief Technology Officer at inDrive. Read More. Introducing Provable Randomness in Beldex Consensus with Verifiable Random Functions By @beldexcoin [ 6 Min read […]

Ver mais

Like 0

Liked Liked

technocracy

Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback

digitado ⋅ 9 de February de 2026

In this paper, we study Interaction-Grounded Learning (IGL) [Xie et al., 2021], a paradigm designed for realistic scenarios where the learner receives indirect feedback generated by an unknown mechanism, rather than explicit numerical rewards. While prior work on IGL provides efficient algorithms with provable guarantees, those results are confined to single-step settings, restricting their applicability to modern sequential decision-making systems such as multi-turn Large Language Model (LLM) deployments. To bridge this gap, we propose a computationally efficient algorithm […]

Ver mais

Like 0

Liked Liked

technocracy

Trust-Based Incentive Mechanisms in Semi-Decentralized Federated Learning Systems

digitado ⋅ 9 de February de 2026

In federated learning (FL), decentralized model training allows multi-ple participants to collaboratively improve a shared machine learning model without exchanging raw data. However, ensuring the integrity and reliability of the system is challenging due to the presence of potentially malicious or faulty nodes that can degrade the model’s performance. This paper proposes a novel trust-based incentive mechanism designed to evaluate and reward the quality of contributions in FL systems. By dynamically assessing trust scores based on fac-tors such […]

Ver mais

Like 0

Liked Liked

technocracy

Building a RL agent For Prince of persia(1989)

digitado ⋅ 9 de February de 2026

I’ve been working on a reinforcement learning project around the original Prince of Persia (1989) using SDLPoP. Instead of using raw pixels, I built a grid-based observation directly from the game state. Each room becomes a small multi-channel grid showing platforms, hazards, gates, exits, items, and character positions. The idea is to reduce the CNN’s burden of trying to understand interactable platforms and hazards from just a few pixels and instead give structured spatial information. On the action […]

Ver mais

Like 0

Liked Liked

technocracy

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

digitado ⋅ 9 de February de 2026

Reinforcement Learning (RL) has emerged as a crucial method for training or fine-tuning large language models (LLMs), enabling adaptive, task-specific optimizations through interactive feedback. Multi-Agent Reinforcement Learning (MARL), in particular, offers a promising avenue by decomposing complex tasks into specialized subtasks learned by distinct interacting agents, potentially enhancing the ability and efficiency of LLM systems. However, theoretical insights regarding when and why MARL outperforms Single-Agent RL (SARL) remain limited, creating uncertainty in selecting the appropriate RL framework. In […]

Ver mais

Like 0

Liked Liked

technocracy

To Grok Grokking: Provable Grokking in Ridge Regression

digitado ⋅ 9 de February de 2026

arXiv:2601.19791v2 Announce Type: replace-cross Abstract: We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the training data early during training; (ii) poor generalization persists long after overfitting has manifested; and (iii) the generalization error eventually becomes arbitrarily small. Moreover, we show, both […]

Ver mais

Like 0

Liked Liked

technocracy

Multi-fidelity graph-based neural networks architectures to learn Navier-Stokes solutions on non-parametrized 2D domains

digitado ⋅ 9 de February de 2026

arXiv:2601.02157v2 Announce Type: replace-cross Abstract: We propose a graph-based, multi-fidelity learning framework for the prediction of stationary Navier–Stokes solutions in non-parametrized two-dimensional geometries. The method is designed to guide the learning process through successive approximations, starting from reduced-order and full Stokes models, and progressively approaching the Navier–Stokes solution. To effectively capture both local and long-range dependencies in the velocity and pressure fields, we combine graph neural networks with Transformer and Mamba architectures. While Transformers achieve the highest accuracy, […]

Ver mais

Like 0

Liked Liked