[D] where can I find more information about NTK wrt Lazy and Rich learning?
Specifically, I’m curious about:
- What are the practical heuristics (or methods) for determining which regime a model is operating in during training?
- How does the scale of initialization and the learning rate specifically bias a network toward feature learning over the kernel regime?
- Are there specific architectures where the “lazy” assumption is actually preferred for stability?
- Is there just one “rich“ regime or is richness a spectrum of regimes?
I’m vaguely aware about how lazy regimes are when the NTK doesn’t really change. I’m also vaguely aware that rich learning isn’t 100% ideal and that you want a bit of both. But I’m having a hard time finding the seminal papers and work on this topic.
submitted by /u/vhu9644
[link] [comments]
Like
0
Liked
Liked