digitado

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

digitado ⋅ 13 de February de 2026

The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially for robots without human-like hands. A promising way forward is to use a […]

Ver mais

Like 0

Liked Liked

technocracy

OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

digitado ⋅ 26 de January de 2026

Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research agents. To bridge this gap, we introduce a fully open-source […]

Ver mais

Like 0

Liked Liked

technocracy

Implementing FFTs in Practice

digitado ⋅ 2 de March de 2026

arXiv:2602.23525v1 Announce Type: new Abstract: This review article was first published in 2008 as chapter 11 in the book “Fast Fourier Transforms,” edited by C. S. Burrus, for the Connexions project at Rice University, which is sadly no longer online. It gives a high-level overview of some of the engineering considerations that arise in high-performance implementations of fast Fourier trasnforms (FFTs). It explains why optimized FFTs are very different from textbook “radix-2 Cooley-Tukey” FFT algorithms, in order to […]

Ver mais

Like 0

Liked Liked

technocracy

Theory of Mind and Self-Attributions of Mentality are Dissociable in LLMs

digitado ⋅ 1 de April de 2026

arXiv:2603.28925v1 Announce Type: new Abstract: Safety fine-tuning in Large Language Models (LLMs) seeks to suppress potentially harmful forms of mind-attribution such as models asserting their own consciousness or claiming to experience emotions. We investigate whether suppressing mind-attribution tendencies degrades intimately related socio-cognitive abilities such as Theory of Mind (ToM). Through safety ablation and mechanistic analyses of representational similarity, we demonstrate that LLM attributions of mind to themselves and to technological artefacts are behaviorally and mechanistically dissociable from ToM […]

Ver mais

Like 0

Liked Liked

technocracy

How should I plan my learning path for reinforcement learning courses?

digitado ⋅ 17 de May de 2026

Hi everyone, I have a question about planning my reinforcement learning studies. I’m currently a sophomore majoring in a non-CS field. My math background includes calculus, probability and statistics, linear algebra, and some mathematical analysis. I want to start learning reinforcement learning, but according to many recommendations, it seems I may also need additional math courses such as ODEs, real analysis, stochastic processes, etc. Is that really necessary at my current stage? Or would it be better to […]

Ver mais

Like 0

Liked Liked

technocracy

Finder: A Multimodal AI-Powered Search Framework for Pharmaceutical Data Retrieval

digitado ⋅ 18 de March de 2026

arXiv:2603.15623v1 Announce Type: new Abstract: AI is transforming pharmaceutical search, where traditional systems struggle with multimodal content and manual curation. Finder is a scalable AI-powered framework that unifies retrieval across text, images, audio, and video using hybrid vector search, combining sparse lexical and dense semantic models. Its modular pipeline ingests diverse formats, enriches metadata, and stores content in a vector-native backend. Finder supports reasoning-aware natural language search, improving precision and contextual relevance. The system has processed over 291,400 […]

Ver mais

Like 0

Liked Liked

technocracy

RL-ASL: A Dynamic Listening Optimization for TSCH Networks Using Reinforcement Learning

digitado ⋅ 10 de April de 2026

arXiv:2604.07533v1 Announce Type: new Abstract: Time Slotted Channel Hopping (TSCH) is a widely adopted Media Access Control (MAC) protocol within the IEEE 802.15.4e standard, designed to provide reliable and energy-efficient communication in Industrial Internet of Things (IIoT) networks. However, state-of-the-art TSCH schedulers rely on static slot allocations, resulting in idle listening and unnecessary power consumption under dynamic traffic conditions. This paper introduces RL-ASL, a reinforcement learning-driven adaptive listening framework that dynamically decides whether to activate or skip a […]

Ver mais

Like 0

Liked Liked

technocracy

Decisions and Deployment: The Five-Year SAHELI Project (2020-2025) on Restless Multi-Armed Bandits for Improving Maternal and Child Health

digitado ⋅ 10 de April de 2026

arXiv:2604.07384v1 Announce Type: new Abstract: Maternal and child health is a critical concern around the world. In many global health programs disseminating preventive care and health information, limited healthcare worker resources prevent continuous, personalised engagement with vulnerable beneficiaries. In such scenarios, it becomes crucial to optimally schedule limited live-service resources to maximise long-term engagement. To address this fundamental challenge, the multi-year SAHELI project (2020-2025), in collaboration with partner NGO ARMMAN, leverages AI to allocate scarce resources in a […]

Ver mais

Like 0

Liked Liked

technocracy

TSSR: Two-Stage Swap-Reward-Driven Reinforcement Learning for Character-Level SMILES Generation

digitado ⋅ 8 de January de 2026

The design of reliable, valid, and diverse molecules is fundamental to modern drug discovery, as improved molecular generation supports efficient exploration of the chemical space for potential drug candidates and reduces the cost of early design efforts. Despite these needs, current chemical language models that generate molecules as SMILES strings are vulnerable to compounding token errors: many samples are unparseable or chemically implausible, and hard constraints meant to prevent failure can restrict exploration. To address this gap, we […]

Ver mais

Like 0

Liked Liked

technocracy

Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling

digitado ⋅ 10 de April de 2026

Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making, yet its standard Gaussian policy parameterization is inherently unimodal, limiting its ability to model complex multimodal action distributions. This limitation has motivated increasing interest in generative policies based on diffusion and flow matching as more expressive alternatives. However, incorporating such policies into MaxEnt RL is challenging for two main reasons: the likelihood and entropy of continuous-time generative policies are generally intractable, and multi-step […]

Ver mais

Like 0

Liked Liked