PPO agent for network control
I built a PPO-Agent to control flows inside a physical network. The agent controls the 15 control variables, which in physical world would mean how strong we are pumping the medium inside the network. It is working after 25 million environment steps. I was testing different reward functions and so far the best was something like following: reward = -1 * tanh(physical_violations_in_network) + 0.05 * tanh(violation_improvement_from_previous_step) – 0.07 * tanh(violation_deterioration_from_previous_step) I made the improvement coef and deterioration coef […]