How to implement RL on trash recognizer robot

digitado ⋅ 17 de April de 2026

Hi!

I’m currently working on a robot that recognizes trash and sends it to a server.

It’s a basic robot with four wheels, motors, and several sensors (ultrasonic sensors in four directions, a gyroscope, accelerometers, etc.). It also has a camera and a Raspberry Pi on top.

To recognize trash, I use YOLO, and when it detects trash, it sends a picture to the server.

Right now, I’m using a simple algorithm to explore the area with the robot, but I would like to replace it with a PPO-based approach.

I already tried using the following inputs:
(front_dist, left_dist, right_dist, x_pos, y_pos, x_cell, y_cell, angle_to_the_nearest_cell)
(A cell is a 100 cm × 100 cm square.)

For the outputs, I used a softmax over two actions: move (25 cm) and turn (30°).

And for the rewards:

NEW_CELL_REWARD = 3 (when it discovers a new cell)
MOVE_REWARD = -0.3 (for each movement)
PENALTY_REWARD = -50 (when it hits a wall or object)
END_GAME_REWARD = 50 (when all cells are discovered)

However, the robot doesn’t explore the room efficiently. Even after around 1000 episodes, its behavior still looks random and unfocused.

I would also like it to output the amount it should turn, but I’m not sure how to implement that.

submitted by /u/Independent-Key-1329
[link] [comments]

Like 0

Liked Liked