digitado – Page 505

I built a RL trading bot that learned risk management on its own — without me teaching it

digitado ⋅ 10 de April de 2026

After 20 dead versions and about 2 month of work, my RL agent (NASMU) passed its walk-forward backtest across 2020–2026. But the most interesting part wasn’t the results — it was what the model actually learned. The setup: – PPO + xLSTM (4 blocks), BTC/USDT 4h bars – 35 features distilled from López de Prado, Hilpisch, Kaabar, Chan and others – Triple Barrier labeling (TP/SL/Timeout) – HMM for regime detection (bull/bear/sideways) – Running on a Xeon E5-1650 v2 […]

Ver mais

Like 0

Liked Liked

technocracy

Temporal Difference Learning with Constrained Initial Representations

digitado ⋅ 12 de February de 2026

Recently, there have been numerous attempts to enhance the sample efficiency of off-policy reinforcement learning (RL) agents when interacting with the environment, including architecture improvements and new algorithms. Despite these advances, they overlook the potential of directly constraining the initial representations of the input data, which can intuitively alleviate the distribution shift issue and stabilize training. In this paper, we introduce the Tanh function into the initial layer to fulfill such a constraint. We theoretically unpack the convergence […]

Ver mais

Like 0

Liked Liked

technocracy

Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis

digitado ⋅ 25 de January de 2026

Language models are revolutionizing the biochemistry domain, assisting scientists in drug design and chemical synthesis with high efficiency. Yet current approaches struggle between small language models prone to hallucination and limited knowledge retention, and large cloud-based language models plagued by privacy risks and high inference costs. To bridge this gap, we introduce ChemCRAFT, a novel framework leveraging agentic reinforcement learning to decouple chemical reasoning from knowledge storage. Instead of forcing the model to memorize vast chemical data, our […]

Ver mais

Like 0

Liked Liked

technocracy

An Evolutionary Algorithm for Actuator-Sensor-Communication Co-Design in Distributed Control

digitado ⋅ 9 de April de 2026

arXiv:2604.06299v1 Announce Type: new Abstract: This paper studies the co-design of actuators, sensors, and communication in the distributed setting, where a networked plant is partitioned into subsystems each equipped with a sub-controller interacting with other sub-controllers. The objective is to jointly minimize control cost (measured by LQ cost) and material cost (measured by the number of actuators, sensors, and communication links used). We approach this using an evolutionary algorithm to selectively prune a baseline dense LQR controller. We […]

Ver mais

Like 0

Liked Liked

technocracy

Incorporating data drift to perform survival analysis on credit risk

digitado ⋅ 29 de January de 2026

arXiv:2601.20533v1 Announce Type: new Abstract: Survival analysis has become a standard approach for modelling time to default by time-varying covariates in credit risk. Unlike most existing methods that implicitly assume a stationary data-generating process, in practise, mortgage portfolios are exposed to various forms of data drift caused by changing borrower behaviour, macroeconomic conditions, policy regimes and so on. This study investigates the impact of data drift on survival-based credit risk models and proposes a dynamic joint modelling framework […]

Ver mais

Like 0

Liked Liked

technocracy

D3M: Improving Group Robustness via Dataset Selection

digitado ⋅ 25 de June de 2024

Paper Code Machine learning models are increasingly making decisions in high-stakes scenarios, from healthcare to finance to criminal justice. These models are trained on large-scale datasets that often contain biased data. As a result, these models often exhibit disparate performance across different subgroups of the data. For instance, facial recognition systems have been shown to perform poorly on images of Black women, while medical imaging models struggle with X-rays of patients without chest drains. Such […]

Ver mais

Like 0

Liked Liked

technocracy

FDA deletes warning on bogus autism therapies touted by RFK Jr.‘s allies

digitado ⋅ 13 de January de 2026

For years, the Food and Drug Administration provided an informational webpage for parents warning them of the dangers of bogus autism treatments, some promoted by anti-vaccine activists and “wellness” companies. The page cited specifics scams and the “significant health risks” they pose. But, under anti-vaccine Health Secretary Robert F. Kennedy Jr.—who has numerous ties to the wellness industry—that FDA information webpage is now gone. It was quietly deleted at the end of last year, the Department of Health […]

Ver mais

Like 0

Liked Liked

technocracy

From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning

digitado ⋅ 13 de April de 2026

The integration of Large Language Models (LLMs) into clinical decision support is critically obstructed by their opaque and often unreliable reasoning. In the high-stakes domain of healthcare, correct answers alone are insufficient; clinical practice demands full transparency to ensure patient safety and enable professional accountability. A pervasive and dangerous weakness of current LLMs is their tendency to produce "correct answers through flawed reasoning." This issue is far more than a minor academic flaw; such process errors signal a […]

Ver mais

Like 0

Liked Liked

technocracy

When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems

digitado ⋅ 9 de February de 2026

Reinforcement Learning (RL) has emerged as a crucial method for training or fine-tuning large language models (LLMs), enabling adaptive, task-specific optimizations through interactive feedback. Multi-Agent Reinforcement Learning (MARL), in particular, offers a promising avenue by decomposing complex tasks into specialized subtasks learned by distinct interacting agents, potentially enhancing the ability and efficiency of LLM systems. However, theoretical insights regarding when and why MARL outperforms Single-Agent RL (SARL) remain limited, creating uncertainty in selecting the appropriate RL framework. In […]

Ver mais

Like 0

Liked Liked

technocracy

MemGround: Long-Term Memory Evaluation Kit for Large Language Models in Gamified Scenarios

digitado ⋅ 18 de April de 2026

arXiv:2604.14158v1 Announce Type: new Abstract: Current evaluations of long-term memory in LLMs are fundamentally static. By fixating on simple retrieval and short-context inference, they neglect the multifaceted nature of complex memory systems, such as dynamic state tracking and hierarchical reasoning in continuous interactions. To overcome these limitations, we propose MemGround, a rigorous long-term memory benchmark natively grounded in rich, gamified interactive scenarios. To systematically assess these capabilities, MemGround introduces a three-tier hierarchical framework that evaluates Surface State Memory, […]

Ver mais

Like 0

Liked Liked