Página de exemplo
Política de privacidade

[D] Is Grokking unique to transformers/attention?

digitado ⋅ 23 de January de 2026

Is Grokking unique to attention mechanism, every time I’ve read up on it seems to suggest that’s it a product of attention and models that utilise it. Is this the case or can standard MLP also start grokking?

submitted by /u/Dependent-Shake3906
[link] [comments]

Like 0

Liked Liked

« Telly’s “free” ad-based TVs make notable revenue—when they’re actually delivered » White House alters arrest photo of ICE protester, says “the memes will continue”

Search

Posts recentes

[D] Are we prematurely abandoning Bio-inspired AI? The gap between Neuroscience and DNN Architecture.
Core Ultra Series 3 launch may be hampered by chip shortages, says Intel
[D] How do you usually deal with dense equations when reading papers?
DHS keeps trying and failing to unmask anonymous ICE critics online
How did Davos turn into a tech conference?

Comentários

No comments to show.

Arquivos

Categorias

technocracy

Digitado © 2025