[D] where can I find more information about NTK wrt Lazy and Rich learning?

digitado ⋅ 26 de February de 2026

Specifically, I’m curious about:

What are the practical heuristics (or methods) for determining which regime a model is operating in during training?
How does the scale of initialization and the learning rate specifically bias a network toward feature learning over the kernel regime?
Are there specific architectures where the “lazy” assumption is actually preferred for stability?
Is there just one “rich“ regime or is richness a spectrum of regimes?

I’m vaguely aware about how lazy regimes are when the NTK doesn’t really change. I’m also vaguely aware that rich learning isn’t 100% ideal and that you want a bit of both. But I’m having a hard time finding the seminal papers and work on this topic.

submitted by /u/vhu9644
[link] [comments]

Like 0

Liked Liked