Request: RL algorithm for a slow but parallel episodic task?

I have an episodic problem which always takes 30 days to complete, and each time step takes 1 day. Also, at any given time, there are around 1000 episodes simultaneously running (although start dates might be different). That means each day around 33 new episodes start and another 33 end. The action space is discrete (5 different actions). Which kind of algorithms would be good for this type problem?

submitted by /u/diepala
[link] [comments]

Liked Liked