← AI FoundationsLesson 03 of 10

How an LLM Actually Works

Underneath the magic, it does one thing: predict the next chunk of text, over and over. It's autocomplete at planetary scale — and the same trick that makes it beautifully fluent is exactly what makes it confidently wrong.

The one mental model

Read the text → predict the next token → add it → look again → repeat. To decide what's likely, it turns words into numbers (similar meanings sit close together), and at each step it picks from a ranked list of probabilities. How boldly it picks is set by temperature.

Key terms

Next-token prediction

The whole engine: guess the next chunk of text, add it, repeat. It isn't planning the ending — it's choosing the next step.

Embeddings

Words turned into long lists of numbers (coordinates). Meaning becomes position — "king" sits near "queen." king − man + woman ≈ queen.

Probabilities

At each step it ranks candidate next-words with probabilities, and usually (not always) picks near the top.

Temperature

How adventurous the pick is. Low = focused, repeatable (facts/code). High = creative, riskier.

The misconception to drop

✕"There's a mind in the box that knows things and looks them up."

✓It's an extremely well-read autocomplete predicting the next word. That single mechanism explains both sides: the fluency is real, and so is the confident-but-wrong — it aims for what sounds likely, not what's true.

← PreviousAI, ML, and LLMs — The Map Up next →Tokens, Context Windows & Why AI Forgets