[R] AdamWClip: AdamW with adaptive gradient clipping
Hi,
Would you like to try out an optimizer that does (adaptive) gradient clipping, so you don’t have to set clipping thresholds manually?
We have developed AdamWClip, an extension to AdamW that does exactly that, with no additional memory required and only marginal computational overhead. In our preliminary experiments, it often outperformed AdamW with grad_norm clipping by quite a significant margin, so we would be interested to hear how it performs in your use cases.
If you would like to try it, simply insert the following into your code:
%pip install AdamWClip from AdamWClip import AdamWClip ... optimizer = AdamWClip(model.parameters(),*args)
The source code is available on Github: https://github.com/wandeln/AdamWClip
submitted by /u/ElectricVote
[link] [comments]
Like
0
Liked
Liked