CO2 minimization with Deep RL

digitado ⋅ 1 de February de 2026

Hello everyone, I would like to ask for your advice on my bachelor’s thesis project, which I have been working on for weeks but with little success.

By managing traffic light phases, the aim of the project is to reduce CO2 emissions at a selected intersection (and possibly extend it to larger areas). The idea would be to improve a greedy algorithm that decides the phase based on the principle of kinetic energy conservation.

To tackle the problem, I have turned to deep RL, using the stable-baselines3 library.

The simulation is carried out using SUMO and consists of hundreds of episodes with random traffic scenarios. I am currently focusing on a medium traffic scenario, but once fully operational, the agent should learn to manage the various profiles.

I mainly tried DQN and PPO, with discrete action space (the agent decides which direction to give the green light to).

As for the observation space and reward, I did several tests. I tried using a feature-based observation space (for each edge, total number of vehicles, average speed, number of stationary vehicles) up to a discretization of the lane using a matrix indicating the speed for each vehicle. As for the reward, I tried the weighted sum of CO2 and waiting time (using CO2 alone seems to make things worse).

The problem is that I never converge to results as good as the greedy algorithm, let alone better results.

I wonder if any of you have experience with this type of project and could give me some advice on what you think is the best way to approach this problem.

submitted by /u/vinnie92
[link] [comments]

Like 0

Liked Liked