LLM101n: Let’s build a Storyteller

LLM101n header image

What I cannot create, I do not understand. — Richard Feynman

In this course, we build a Storyteller AI Large Language Model (LLM) from scratch — from a one-line bigram model all the way to a deployed, multimodal web app. Everything is implemented end-to-end in Python with minimal prerequisites. By the end you will have a deep, hands-on understanding of how modern LLMs work.

The training corpus throughout is TinyStories — a dataset of short children’s stories — keeping experiments fast enough to run on a laptop while still producing meaningful results.

Syllabus

#	Chapter	Key concepts
01	Bigram Language Model	language modeling, NLL loss, character-level tokenization
02	Micrograd	scalar autodiff, backpropagation from scratch
03	N-gram MLP	multi-layer perceptron, matmul, GELU
04	Attention	self-attention, softmax, positional encoding
05	Transformer	GPT-2 architecture, residual connections, LayerNorm
06	Tokenization	Byte Pair Encoding (BPE), minBPE
07	Optimization	weight initialization, AdamW, LR schedules
08	Need for Speed I: Device	CPU vs GPU, device-agnostic PyTorch
09	Need for Speed II: Precision	mixed precision, fp16, bf16, fp8
10	Need for Speed III: Distributed	DDP, ZeRO, DeepSpeed
11	Datasets	data loading, synthetic data generation
12	Inference I: KV-Cache	key-value cache, autoregressive generation
13	Inference II: Quantization	INT8/INT4 quantization
14	Finetuning I: SFT	supervised finetuning, PEFT, LoRA, chat format
15	Finetuning II: RL	RLHF, PPO, DPO
16	Deployment	FastAPI server, streaming, web UI
17	Multimodal	VQVAE, diffusion transformer, image+text

Repository Layout

LLM101n/
├── chNN.md          # Chapter narratives + embedded Python code (kept in sync via inject.py)
├── codes/
│   ├── inject.py    # Syncs named blocks from codes/chNN/main.py → chNN.md
│   ├── extract.py   # Legacy: extracted markdown → main.py (no longer the active workflow)
│   ├── chNN/
│   │   ├── main.py  # Runnable script — SOURCE OF TRUTH for code
│   │   └── run.log  # Expected output
│   └── data/        # Shared datasets, checkpoints, tokenizers
└── llm101n.jpg

codes/chNN/main.py is the source of truth. Edit the Python scripts directly; run inject.py to sync changes back into the markdown chapter files.

Getting Started

Prerequisites

Python 3.10+
A GPU is helpful but not required for the early chapters

Setup

git clone https://github.com/bagustris/LLM101n.git
cd LLM101n

# Create and activate the virtual environment
uv venv codes/.venv
source codes/.venv/bin/activate   # Windows: codes\.venv\Scripts\activate

uv pip install torch datasets transformers tqdm fastapi uvicorn

Running a chapter

source codes/.venv/bin/activate
cd codes/ch01
python main.py

Each chapter is self-contained. Chapter 01 downloads the TinyStories dataset on the first run and saves it to codes/data/ so subsequent chapters can reuse it without hitting the network again.

Editing code and syncing to markdown

codes/chNN/main.py is the source of truth. Edit it directly, then sync named blocks back into the chapter markdown:

cd codes
python inject.py            # sync all chapters
python inject.py ch05       # sync one chapter
python inject.py --dry-run  # preview diffs without writing
python inject.py --status   # show which blocks are marked

Note on block markers: In main.py, wrap editable sections with # === block: <name> === / # === /block: <name> ===. In the markdown, wrap the matching fence with  / . inject.py will replace only those fenced regions.

Shared Data (`codes/data/`)

File / Directory	Description
`tinystories_train.txt`	50 K training stories
`tinystories_val.txt`	5 K validation stories
`gpt_tinystories.pt`	Pretrained GPT-2 checkpoint on TinyStories
`tinystories_bpe_tokenizer.json`	BPE tokenizer (128-token vocab)
`lora_adapter/`	Saved LoRA adapter (rank=8, alpha=16)
`vqvae_cifar10.pt`	Pretrained VQVAE for CIFAR-10 (ch17)
`cifar-10-batches-py/`	CIFAR-10 image dataset (ch17)
`server.py`	FastAPI streaming text-generation server (ch16)
`frontend.html`	Web UI (ch16)
`Dockerfile`	Container for the deployed server (ch16)
`ds_config.json`	DeepSpeed config for distributed training (ch10)

Appendix — Topics to Explore Further

Programming languages: Assembly, C, Python internals
Data types: Integer, Float, String (ASCII, Unicode, UTF-8)
Tensors: shapes, views, strides, contiguous memory
Frameworks: PyTorch, JAX
Architectures: GPT-1/2/3/4, Llama (RoPE, RMSNorm, GQA), Mixture-of-Experts
Multimodal: Images, Audio, Video, VQVAE, VQGAN, Diffusion models