microgpt

Documentation

Learn how microgpt works - a minimal GPT implementation in pure Python

microgpt Documentation

A complete, working GPT model in ~250 lines of pure Python with zero dependencies.

What is microgpt?

microgpt is an educational implementation by Andrej Karpathy that demonstrates exactly how a language model works. No PyTorch, no NumPy—just Python's standard library.

python microgpt.py

That's it. The model downloads a dataset, trains, and generates text.

Documentation Sections

SectionDescription
Getting StartedIntroduction and overview
TokenizationConverting text to numbers
FoundationsGradients and parameters
ArchitectureThe GPT model components
TrainingHow models learn
AutogradAutomatic differentiation
GenerationText generation
Code ReferenceLine-by-line explanation

How It Works

The entire pipeline:

  1. Tokenize - Convert characters to integers
  2. Embed - Convert integers to vectors
  3. Transform - Apply attention and MLP layers
  4. Predict - Generate probability distributions
  5. Train - Use backpropagation to improve
  6. Generate - Sample from the model

Each component is explained in detail in the sections above.

On this page