Best practices for Reward Engineering in Autonomous Driving to avoid reward hacking and local optima?

Hi everyone,

I am currently training an RL agent for an autonomous driving task, but I’ve hit a wall with Reward Engineering.

Right now, I am stuck in a tedious, manual trial-and-error loop:

  1. The car stops completely to avoid risk -> I add a too_slow_penalty.
  2. The car then drives too aggressively at intersections -> I add an overspeed_penalty.

As a result, my reward function is becoming bloated with too many heuristics and hyperparameters. Tuning one weight to fix a specific behavior invariably ruins another (e.g., punishing speed causes the agent to become overly conservative and stop again).

I would highly appreciate your insights on two aspects:

  1. Structure: What is the industry/academic standard approach for structuring multi-objective rewards in autonomous driving? Should I look into Reward Shaping, Curriculum Learning, or perhaps Inverse Reinforcement Learning (IRL)?
  2. Hyperparameters: How do you systematically balance the trade-offs between positive rewards (progress, lane-keeping) and negative penalties (collisions, traffic violations) without just guessing the weights?

Are there any specific frameworks, papers, or methodologies you would recommend for this? Thank you!

submitted by /u/InviteExtension3976
[link] [comments]

Liked Liked