[D] where can I find more information about NTK wrt Lazy and Rich learning?

Specifically, I’m curious about:

  1. What are the practical heuristics (or methods) for determining which regime a model is operating in during training?
  2. How does the scale of initialization and the learning rate specifically bias a network toward feature learning over the kernel regime?
  3. Are there specific architectures where the “lazy” assumption is actually preferred for stability?
  4. Is there just one “rich“ regime or is richness a spectrum of regimes?

I’m vaguely aware about how lazy regimes are when the NTK doesn’t really change. I’m also vaguely aware that rich learning isn’t 100% ideal and that you want a bit of both. But I’m having a hard time finding the seminal papers and work on this topic.

submitted by /u/vhu9644
[link] [comments]

Liked Liked