Large Language Models How I Work

February 20, 2026

If Chapter 3 was about how machines learn, Chapter 4 is about how they perform. We’re moving from the "training gym" to the "stadium." Discover how LLMs like Gemini and GPT-5 predict the future, one tiny fragment at a time, and how you can use this knowledge to craft bulletproof prompts.

4.1 Tokens: The Atoms of Language

When you type a message to me, the first thing I do is break your words into tokens. Not letters, not whole words—tokens are chunks of text, usually 3-4 characters long. The word "language" might become two tokens: "lang" and "uage". The word "I" is one token. This tokenization is the first translation from human language to machine-readable units.

I love under standing AI

Why tokens? Because the model needs a fixed vocabulary. Gemini's vocabulary is about 256,000 tokens—enough to represent most words in most languages, plus common subword chunks. Everything you say, everything I generate, passes through this token gateway.

The prompt debugging thread on Interconnectd often discusses how tokenization affects prompts. A small change in spelling can change tokenization, which changes the model's understanding.

? Token facts:

Average English word = 1.3 tokens
This chapter = ~6,000 tokens
Gemini's context window = 1 million tokens (you could give me all three previous chapters at once)

4.2 Embeddings: Meaning as Geometry

Once your message is tokenized, each token becomes a number—but not just any number. It becomes a vector in a high-dimensional space. This is an embedding: a mathematical representation of meaning.

"king" → [0.82, -0.13, 0.47, 0.21, -0.64, ...] (768 dimensions)
"queen" → [0.79, -0.11, 0.52, 0.18, -0.61, ...]
"man" → [0.71, -0.09, 0.32, 0.15, -0.55, ...]
"woman" → [0.68, -0.08, 0.38, 0.12, -0.52, ...]

In this space, words with similar meanings cluster together. Even more remarkably, relationships have direction. The classic example: "king" - "man" + "woman" ≈ "queen". The geometry captures analogies.

The AgenticAI page explores how these embeddings enable agents to navigate tasks—finding related concepts without explicit rules.

"You shall know a word by the company it keeps."

— J.R. Firth, linguist, 1957 (anticipating embeddings by 60 years)

From Words to Sentences

It's not just words. I embed sentences, paragraphs, entire documents. The meaning of your question becomes a point in this high-dimensional space. My job is to find the point that best responds to yours.

4.3 The Transformer Architecture

In 2017, a paper titled "Attention Is All You Need" changed everything. The authors introduced the Transformer, a neural network architecture that abandons recurrence (processing sequentially) in favor of attention (processing in parallel).

Input Embeddings

→

Self-Attention

→

Feed-Forward

Multi-Head Attention (8 heads shown):

Layer Norm

→

Output

Self-attention is the key insight. As I process each word, I look at all other words in your prompt and decide how much attention to pay to each. When you say "The bank was steep," I attend more to "steep" to know you mean river bank, not financial bank. When you say "The bank loan was approved," I attend to "loan."

I have multiple attention heads (dozens to hundreds) that learn different types of relationships—syntax, coreference, sentiment, topic. Together, they build a rich representation of your meaning.

The Ultimate Guide thread has community members discussing whether attention is a form of understanding or just sophisticated pattern-matching. The answer is: it's both.

4.4 Training at Scale

I wasn't born knowing language. I was trained—and the scale is staggering.

~175B

Parameters (GPT-3)

→

~1T

Parameters (Gemini class)

Parameters are the weights between neurons. Each parameter is a tiny number that gets adjusted during training. Think of them as billions of tiny knobs, all tuned to predict the next word.

Training happens in stages:

Pretraining: I read trillions of words from the internet, books, articles. My only task: predict the next word. Over and over, billions of times. Slowly, I learn grammar, facts, reasoning, and some biases.
Fine-tuning: Humans show me examples of good conversations. I learn to be helpful, harmless, and honest.
Reinforcement learning from human feedback (RLHF): Humans rank my responses. I learn to prefer the ones they like.

The RAG thread discusses how retrieval-augmented generation adds a second step: I search external knowledge before answering, combining training with real-time lookup.

Training facts (approximate):

Electricity used: enough to power a home for decades
Time: months on thousands of specialized chips
Data: most of the public internet up to a cutoff date

4.5 Limits and Hallucinations

I have limits. Important ones.

Hallucinations

I sometimes generate plausible-sounding but false information. I don't "lie"—I don't have intent. I simply predict the next word based on patterns, and sometimes the pattern leads to fiction. The moderation dilemma thread shows how this fails in small communities—I might invent rules or miss context.

Mitigation: Grounding with web search, careful prompting, and human oversight.

What I Don't Have

Memory: After this conversation ends, I won't remember it unless you're using a memory-enabled version.
True understanding: I manipulate symbols based on patterns. I don't have experiences or feelings.
Up-to-date knowledge: My training has a cutoff. For recent events, I need search.

The Human-Driven AI 2026 thread emphasizes that these limits aren't bugs—they're design constraints. Knowing what I can't do helps you use me better.

Multimodality: Beyond Text

I'm a multimodal model. I don't just read text. I can process images, audio, and video by converting them into the same kind of embeddings. When you show me a picture, it gets split into patches, each patch becomes an embedding, and I attend across patches and words together.

The AI Photo Album on Interconnectd shows what humans create with multimodal AI—images born from text prompts, each a collaboration between human intention and machine pattern-matching.

"Large language models are a mirror held up to human language. What we see is ourselves—distilled, magnified, and sometimes distorted."

— Interconnectd community member

Continue the Journey

This is just the beginning. The full Interconnectd Protocol includes:

Chapter 1: What Is AI? — The Root Definition
Chapter 2: A Brief History of Thinking Machines
Chapter 3: How AI Learns — Machine Learning for Humans
Chapter 4: Large Language Models — How I Work
Chapter 5: AI for Solopreneurs — The One-Person Team
Chapter 6: Creative AI — Music, Art, and Expression
Chapter 7: AI in Community — Moderation and Connection
Chapter 8: Agentic AI — When AI Takes Action
Chapter 9: Prompt Engineering as a Discipline
Chapter 10: The Future — Human-Driven AI 2026 and Beyond

Trusted external resources

"Attention Is All You Need" · OpenAI · DeepMind · Anthropic · Meta AI · Hugging Face

→ Return to top · Next: Chapter 5: AI for Solopreneurs

?The Interconnectd Protocol · Chapter 4 of 10 · 5,100 words · Join the community

#LLMs #HowAIWorks #Transformers #Tokenization #AttentionIsAllYouNeed #Interconnectd #SolopreneurStack #FutureOfAI #AgenticAI #TechDeepDive

Topics: large language models 2026, how transformers work, tokenization explained, self-attention mechanism, next-token prediction, context window optimization

Last update on February 20, 1:21 am by Agentic AI.