Getting Started
Introduction
Welcome to microgpt - a minimal GPT implementation in pure Python
Welcome to microgpt
microgpt is a complete, working GPT (Generative Pre-trained Transformer) model written in just 250 lines of pure Python with zero dependencies. No PyTorch, no NumPy, nothing but Python's standard library.
This is an educational project by Andrej Karpathy - one of the pioneers of deep learning. The goal is to show exactly how a language model works under the hood.
What You'll Learn
By reading through this documentation, you'll understand:
| Concept | What It Means |
|---|---|
| Tokenization | How to convert text into numbers a computer can process |
| Neural Networks | How computers learn patterns from data |
| Autograd | How computers calculate gradients automatically |
| Transformers | The architecture behind GPT, BERT, and ChatGPT |
| Attention | How models "focus" on relevant parts of text |
| Training | How models learn from examples |
| Inference | How models generate new text |
How Simple Is It?
Here's the entire training loop in just a few lines:
for step in range(num_steps):
# Get a training example
tokens = encode(document)
# Forward pass - make predictions
loss = forward_pass(tokens)
# Backward pass - figure out how to improve
loss.backward()
# Update weights
update_weights()That's it! That's the heart of machine learning.
How to Run
python microgpt.pyYou'll see it download a dataset of names, train for a while, then generate new names.
Customizing the Model
You can change the model size:
# Small model (fast)
python microgpt.py --n_embd 16 --n_layer 1
# Medium model
python microgpt.py --n_embd 32 --n_layer 3 --num_steps 5000
# Large model (slower but smarter)
python microgpt.py --n_embd 64 --n_layer 6 --num_steps 10000Documentation Roadmap
Start here, then follow along in order: