Why ChatGPT Is More Than Autocomplete
Author(s): GSO1 Originally published on Towards AI. Why ChatGPT Is More Than Autocomplete Figure by the author with assistance from Claude (Anthropic) Calling a large language model (LLM) like ChatGPT “autocomplete” is not exactly wrong, but it is deeply misleading. Most of us think of autocomplete as a text-completion tool: a phone keyboard guessing the next word, a search bar finishing a phrase. But that picture is not powerful enough to explain what LLMs actually produce — explanations, analogies, plans, summaries, arguments, code, stories, dialogue. A transformer-based LLM does predict one token at a time — roughly a word, though in practice often a word-fragment — but that visible sequence is only the surface trace of a much richer hidden process. Before each word appears, the model has built a high-dimensional internal state that reflects the topic, the context, the tone, the intent, and the likely directions the answer could take. The next token is not read off the prompt. It is read off this internal state. That is why a system trained only to predict the next token produces explanations, arguments, analogies, plans, and dialogue that feel far more than anything we would call autocomplete. Previous articles in this series developed a geometric picture of the transformer’s internal state in terms of attention— attention as a coupled free-energy minimization and weighted least squares problem [Article 1], simplification of attention to two operators in one space [Article 2], and attention as a geometric flow [Article 3]. This article steps back and asks a plainer question — what is actually happening when a chatbot answers you, and why is autocomplete the wrong picture — and answers it with a minimum of mathematical machinery. The autocomplete misunderstanding Calling an LLM “autocomplete” is tempting because, at one level, it is true. Given a sequence of text, the model predicts the next token. That token is appended to the input, which is fed back in for the next prediction. Stepping through this loop produces the visible response, one token at a time, that has the appearance of autocomplete. The problem is that this picture describes what goes on “across” steps but ignores what goes on “within” steps. Across steps, things feel like autocomplete — the sequential generation of individual words that cohere with previously generated text. (Researchers properly call this process “autoregressive”, but we’ll stick with the less than proper autocomplete for now.)” The missing piece is the hidden computation taking place within a step that generates the next token. Within a step, before a new token is generated, the input prompt is transformed into a high-dimensional internal state. The tokens of the prompt are not treated as isolated words in a list; they are interpreted in relation to one another via the attention mechanism. A question, a definition, a metaphor, a constraint, a conversational tone — all of these shape the internal state from which the next token is drawn. So the next token is not predicted from the input text alone as autocomplete suggests. It is predicted from a rich internal state built from the text. The model then projects a small part of that state into the vocabulary to choose the next token, appends the token to the input text, and rebuilds the state over the longer text for the next prediction step. The observed output is the result of multiple steps thru that loop. “Autocomplete,” then, is technically defensible but conceptually misleading. It names the final visible act of generation and ignores the machinery that makes the act possible. An LLM like ChatGPT does not merely “complete” the observed text. It repeatedly reconstructs meaning over a growing context and projects part of that meaning back into language, one token at a time. Where the next token really comes from An LLM predicts the next token not from the input text alone, but from a dense internal representation built by sequentially processing the text through multiple layers of the model. When a prompt is input to the model, each token is turned into a vector — a point in a high-dimensional space whose location already encodes what pre-training has learned about that token: its meanings, its grammatical roles, the company it tends to keep. This initial cloud of points is only the starting arrangement. As the cloud passes through the model’s many layers, each token/point absorbs information from the rest of the text, so the its final position reflects not just the word it started as but the role that word plays in context with the rest of the cloud. The word “bank” has a different vector representation in “On the river bank” and “Call the investment bank” because nearby words changes its position. The same is true of the text as a whole: a question, an example, a requested tone, a constraint, a prior phrase — each reshapes the text, changing not only what a given word means but what kind of answer becomes likely. Context not only disambiguates individual words. It shapes the entire internal state from which the next token is predicted. Consider four prompts: “Explain E = mc².” — pushes toward teaching, physics, symbol definitions, accessible explanation. “Explain E = mc² to a 10-year-old.” — shifts toward simpler vocabulary, analogy, a gentler tone. “Explain E = mc² in one sentence.” — adds brevity and compression. “Explain E = mc² using calculus.” — shifts toward a more technical, mathematical treatment. The idea to be explained is identical in all four. The surrounding context changes the kind of answer that becomes likely. This is the key point: the next token is not predicted from the text. It is predicted from the model’s internal state after the text has passed through many layers of interaction and conditioning. That state is not a sentence, a paragraph, a private monologue, or a plan. It is a distributed, high-dimensional representation that holds many things at once — topic, syntax, style, intention, discourse structure, relevant facts, and likely continuation. This is why a […]