I built an RL trading agent for crypto futures. Here’s why I abandoned supervised learning for Reinforcement Learning.
A lot of people start algotrading by training an LSTM to predict the next bar’s close. I did too, until I realized trading is a control problem, not a prediction problem. A supervised model predicting a price move with 53% accuracy can still lose money once you factor in fees, slippage, and path-dependent equity.
I recently finished a deep-dive on my autonomous trading architecture, which runs a single Recurrent Soft Actor-Critic (SAC) agent managing a portfolio of six Binance perpetuals (DOGE, BNB, SOL, XRP, ADA, LTC) from a shared equity pool.
Here are the biggest architectural shifts that made it work:
Portfolio Agent > Independent Agents: Six independent agents will demand 6x leverage when the whole market rallies. A single agent observing all six markets jointly (via a MultiheadAttention layer) emits a 13-way softmax over positions and cash. Cash competes for weight, forcing the agent to learn when to step aside.
Differential Sharpe Reward: Naive step-return rewards teach agents to take huge, volatile bets. Using differential Sharpe (a running EMA of risk-adjusted return) grades the agent on a curve. You don’t get extra credit for a 3% day if your variance shoots up to make it.
Preventing Leakage in Walk-Forward: I use a 128-step purge gap between train and validation folds. If you have rolling lookback features (like realized_vol_72), the last training bar bleeds into the validation window without this gap.
Transformer vs LSTM: Used a 2-layer Transformer for the market encoder. It allows direct attention to any prior bar in the 96-bar window. To fit this on a single 15GB GPU, turning on gradient checkpointing was mandatory—saving ~24GB of peak memory at the cost of one extra forward pass.
Happy to answer any questions on the data pipeline or why stationary/fractionally differenced features are absolute lifesavers here.
submitted by /u/playydeadd
[link] [comments]