← AI FoundationsLesson 04 of 10

Tokens, Context & Why AI Forgets

A model reads and writes in chunks called tokens — and it can only hold a limited number at once. When that limit fills, the oldest fall away. That, not the tokens themselves, is why AI "forgets."

The one mental model

Picture a fixed-size working desk. Everything on it, the model can see. As the conversation grows the desk fills, and the oldest notes slide off the back. The model isn't ignoring you — that text is simply no longer on the desk.

Key terms

Token

A chunk of text — sometimes a whole word, sometimes a piece of one, sometimes just a comma. Both your input and the model's output are counted in tokens.

Tokenization

How text is split into tokens. There's no universal rule — every model splits the same sentence its own way.

Context window

The maximum number of tokens a model can hold at once. The "desk." Fixed size.

Hidden tokens

The system prompt and any uploaded file take up the window too — before you type a word. A big file crowds out the conversation.

The misconception to drop

✕"It's ignoring me / tokens make it forget / it remembers me between chats."

✓The context window is a fixed token budget. When the conversation exceeds it, the oldest tokens are pushed out — that's the forgetting. Each new chat starts with an empty window, and nothing carries over unless a memory feature saves it.

← PreviousHow an LLM Actually Works Up next →Hallucination & Knowledge Limits