Guys if you don’t know what Dropout probability to use…

digitado ⋅ 7 de January de 2026

…

Use p = Sigmoid (Normal Gaussian), where Normal Gaussian is X derived from this distribution. This thing is centered around p=0.5, but random, e.g. pytorch: sigmoid(randn_like(x))). it can be 0.2 and it can be 0.7, as training goes it stabilizes.

Gradient Dropout for RL (when we only not update gradients) is soft and can be used even for the last layer, as it does not distord Output Distribution. (from the latest update of Symphony, https://github.com/timurgepard/Symphony-S2/tree/main) I was keen to use 0.7, graph was beautiful from internal generalization, but agent needs to make mistakes more (it is better when generalization comes from real-world experience, at least in the beginning), this further improved in between body development, posture and hand movements.

submitted by /u/Timur_1988
[link] [comments]

Like 0

Liked Liked