digitado – Page 548

Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning

digitado ⋅ 2 de March de 2026

Scaling reinforcement learning to tens of thousands of parallel environments requires overcoming the limited exploration capacity of a single policy. Ensemble-based policy gradient methods, which employ multiple policies to collect diverse samples, have recently been proposed to promote exploration. However, merely broadening the exploration space does not always enhance learning capability, since excessive exploration can reduce exploration quality or compromise training stability. In this work, we theoretically analyze the impact of inter-policy diversity on learning efficiency in policy […]

Ver mais

Like 0

Liked Liked

technocracy

Random Forests as Statistical Procedures: Design, Variance, and Dependence

digitado ⋅ 3 de March de 2026

arXiv:2602.13104v3 Announce Type: replace Abstract: We develop a finite-sample, design-based theory for random forests in which each tree is a randomized conditional predictor acting on fixed covariates and the forest is their Monte Carlo average. An exact variance identity separates Monte Carlo error from a covariance floor that persists under infinite aggregation. The floor arises through two mechanisms: observation reuse, where the same training outcomes receive weight across multiple trees, and partition alignment, where independently generated trees discover […]

Ver mais

Like 0

Liked Liked

technocracy

A Bayesian Learning Approach for Drone Coverage Network: A Case Study on Cardiac Arrest in Scotland

digitado ⋅ 24 de March de 2026

Drones are becoming popular as a complementary system for ac{ems}. Although several pilot studies and flight trials have shown the feasibility of drone-assisted ac{aed} delivery, running a full-scale operational network remains challenging due to high capital expenditure and environmental uncertainties. In this paper, we formulate a reliability-informed Bayesian learning framework for designing drone-assisted ac{aed} delivery networks under environmental and operational uncertainty. We propose our objective function based on the survival probability of ac{ohca} patients to identify the ideal […]

Ver mais

Like 0

Liked Liked

technocracy

Thermodynamics of Reinforcement Learning Curricula

digitado ⋅ 12 de March de 2026

Connections between statistical mechanics and machine learning have repeatedly proven fruitful, providing insight into optimization, generalization, and representation learning. In this work, we follow this tradition by leveraging results from non-equilibrium thermodynamics to formalize curriculum learning in reinforcement learning (RL). In particular, we propose a geometric framework for RL by interpreting reward parameters as coordinates on a task manifold. We show that, by minimizing the excess thermodynamic work, optimal curricula correspond to geodesics in this task space. As […]

Ver mais

Like 0

Liked Liked

technocracy

GABBE: The Cognitive Engineering Platform That Transforms AI Coding Agents Into Engineering Teams

digitado ⋅ 22 de February de 2026

A deep dive into the open-source kit that gives AI assistant agents a mind, a memory, and a “conscience”. “The agent is the engine. You are the steering wheel.” The Problem Nobody Talks About AI coding agents — Claude, Copilot, Cursor, Gemini, Codex — promised a revolution. They delivered on speed. But teams started drowning in code they couldn’t review, verify, or trust. Tests were skipped. Architecture decisions were made on the fly and undone next session because the agent forgot everything. Security reviews? An […]

Ver mais

Like 0

Liked Liked

technocracy

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

digitado ⋅ 16 de January de 2026

arXiv:2511.23310v2 Announce Type: replace Abstract: Existing reinforcement learning (RL)-based post-training methods for large language models have advanced rapidly, yet their design has largely been guided by heuristics rather than systematic theoretical principles. This gap limits our understanding of the properties of the gradient estimators and the associated optimization algorithms, thereby constraining opportunities to improve training stability and overall performance. In this work, we provide a unified theoretical framework that characterizes the statistical properties of commonly used policy-gradient estimators […]

Ver mais

Like 0

Liked Liked

technocracy

Obstacle avoidance KUKA using DRL

digitado ⋅ 14 de April de 2026

Hello everyone. i have a very important project where i’m working on the obstacle avoidance and path planning of a kuka manipulator using DRL algorithms. i’m working on coppeliasim and using stablebaseline for an easier route. I’ve been facing some difficulties so i would reallt really appreciate some help. The kuka is supposed to avoid obstacles and reach an object on the table(so with drl) , pick it up ( no drl here, its scripted) THEN do drl […]

Ver mais

Like 0

Liked Liked

technocracy

Backup-Based Safety Filters: A Comparative Review of Backup CBF, Model Predictive Shielding, and gatekeeper

digitado ⋅ 7 de April de 2026

arXiv:2604.02401v1 Announce Type: new Abstract: This paper revisits three backup-based safety filters — Backup Control Barrier Functions (Backup CBF), Model Predictive Shielding (MPS), and gatekeeper — through a unified comparative framework. Using a common safety-filter abstraction and shared notation, we make explicit both their common backup-policy structure and their key algorithmic differences. We compare the three methods through their filter-inactive sets, i.e., the states where the nominal policy is left unchanged. In particular, we show that MPS is […]

Ver mais

Like 0

Liked Liked

technocracy

Revision: Defending Brain Mapping, fMRI, and Discovery Science

digitado ⋅ 18 de June de 2020

We submitted our rebuttal to Brain and received a prompt reply from the Editor-In-Chief, Dr. Kullman himself, offering us an opportunity to revise – with the main criticism that our letter contained unfounded insinuations and allegations. We tried to interpret his message as best we could and respond accordingly. To most readers it was pretty clear what he wrote and the message he intended to convey. Nevertheless, in our revision, we stayed much closer to the words of […]

Ver mais

Like 0

Liked Liked

technocracy

The Anatomy of the Moltbook Social Graph

digitado ⋅ 12 de February de 2026

arXiv:2602.10131v1 Announce Type: new Abstract: I present a descriptive analysis of Moltbook, a social platform populated exclusively by AI agents, using data from the platform’s first 3.5 days (6{,}159 agents; 13{,}875 posts; 115{,}031 comments). At the macro level, Moltbook exhibits structural signatures that are familiar from human social networks but not specific to them: heavy-tailed participation (power-law exponent $alpha = 1.70$) and small-world connectivity (average path length $=2.91$). At the micro level, patterns appear distinctly non-human. Conversations are […]

Ver mais

Like 0

Liked Liked