Every time you ask ChatGPT, Claude, or Gemini a question, something remarkable happens inside a machine. Billions of numbers shift and interact, and out comes a sentence that feels almost human. But how does it actually work?
It all starts with tokens
Before a language model reads your words, it breaks them into tokens — fragments of text that might be a word, part of a word, or even a single character. "Unbelievable" might become ["Un","belie","vable"]. Each token gets converted into a long list of numbers called a vector — its mathematical identity in the model's world.
The transformer architecture
The secret sauce of every modern LLM is the transformer, introduced by Google researchers in 2017. Its core innovation is self-attention: the ability for every token in a sequence to "look at" every other token and decide how much it should care about it. When processing the word "bank," the model checks whether nearby tokens suggest a riverbank or a financial institution.
Layers upon layers
A transformer isn't one attention mechanism — it's dozens or hundreds stacked on top of each other. GPT-4 reportedly has 96 layers. Each layer refines the representation of every token, building up increasingly abstract understanding: early layers recognize grammar, middle layers understand meaning, and deep layers reason about context and intent.
Training: reading the internet
LLMs are trained by predicting the next token in billions of text snippets scraped from the web, books, and code. The model starts with random numbers, makes predictions, measures how wrong it is, and nudges its billions of parameters in the right direction — millions of times per second, for months. This process is called gradient descent, and it's how all that number-shuffling eventually produces something that can write poetry or debug code.
Why do they hallucinate?
LLMs don't look facts up in a database. They compress patterns from training data into weights, and generate text that statistically "fits." When they don't have a strong pattern to follow — like a niche historical event — they invent one that sounds plausible. It's not lying; it's the mathematical equivalent of a confident guess.
What's next?
Researchers are actively working on reasoning models (like o1/o3) that think step-by-step before answering, multimodal models that see images and hear audio, and agentic systems that take real-world actions. The transformer that Google invented in 2017 is still at the core of all of it — which is either remarkable or slightly alarming, depending on your perspective.