[R] Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Sakana AI introduced a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with long-context fine-tuning.

The core insight of this work challenges a fundamental assumption in Transformer architecture. They discovered that explicit positional embeddings like RoPE are critical for training convergence, but eventually become the primary bottleneck preventing models from generalizing to longer sequences.

submitted by /u/AhmedMostafa16
[link] [comments]

Liked Liked