What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
arXiv:2406.03707v2 Announce Type: replace-cross Abstract: Autoregressive language models have demonstrated a remarkable ability to extract latent structure from text. The embeddings from large language models have been shown to capture aspects of the syntax and semantics of language. But what should embeddings represent? We connect the autoregressive prediction objective to the idea of constructing predictive sufficient statistics to summarize the information contained in a sequence of observations, and use this connection to identify three settings where the optimal […]