See how text breaks into tokens that models actually read
Tokens0
Characters0
Loading tokenizer...
A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly 3/4 of a word (so 100 tokens ~= 75 words).
Encoding
Breakdown
Tokens0
Characters0
Words0
Chars/Token—
How It Works
Tokenizers split text into tokens — the smallest units that language models process. Common words become single tokens, while rare words get split into subword pieces.
Different models use different tokenizers. GPT-5, o1, o3, and GPT-4o all use o200k_base with ~200K vocabulary, while legacy GPT-4 uses cl100k_base with ~100K vocabulary.
Token count affects cost (you pay per token), latency (more tokens = slower), and context limits (each model has a max token count).