← AI FoundationsLesson 03 of 10
How an LLM Actually Works
Underneath the magic, it does one thing: predict the next chunk of text, over and over. It's autocomplete at planetary scale — and the same trick that makes it beautifully fluent is exactly what makes it confidently wrong.
The one mental model
Read the text → predict the next token → add it → look again → repeat. To decide what's likely, it turns words into numbers (similar meanings sit close together), and at each step it picks from a ranked list of probabilities. How boldly it picks is set by temperature.
Key terms
Next-token prediction
The whole engine: guess the next chunk of text, add it, repeat. It isn't planning the ending — it's choosing the next step.
Embeddings
Words turned into long lists of numbers (coordinates). Meaning becomes position — "king" sits near "queen." king − man + woman ≈ queen.
Probabilities
At each step it ranks candidate next-words with probabilities, and usually (not always) picks near the top.
Temperature
How adventurous the pick is. Low = focused, repeatable (facts/code). High = creative, riskier.
The misconception to drop
✕"There's a mind in the box that knows things and looks them up."
✓It's an extremely well-read autocomplete predicting the next word. That single mechanism explains both sides: the fluency is real, and so is the confident-but-wrong — it aims for what sounds likely, not what's true.