microgpt
Getting Started

Introduction

Welcome to microgpt - a minimal GPT implementation in pure Python

Welcome to microgpt

microgpt is a complete, working GPT (Generative Pre-trained Transformer) model written in just 250 lines of pure Python with zero dependencies. No PyTorch, no NumPy, nothing but Python's standard library.

This is an educational project by Andrej Karpathy - one of the pioneers of deep learning. The goal is to show exactly how a language model works under the hood.

What You'll Learn

By reading through this documentation, you'll understand:

ConceptWhat It Means
TokenizationHow to convert text into numbers a computer can process
Neural NetworksHow computers learn patterns from data
AutogradHow computers calculate gradients automatically
TransformersThe architecture behind GPT, BERT, and ChatGPT
AttentionHow models "focus" on relevant parts of text
TrainingHow models learn from examples
InferenceHow models generate new text

How Simple Is It?

Here's the entire training loop in just a few lines:

for step in range(num_steps):
    # Get a training example
    tokens = encode(document)

    # Forward pass - make predictions
    loss = forward_pass(tokens)

    # Backward pass - figure out how to improve
    loss.backward()

    # Update weights
    update_weights()

That's it! That's the heart of machine learning.

How to Run

python microgpt.py

You'll see it download a dataset of names, train for a while, then generate new names.

Customizing the Model

You can change the model size:

# Small model (fast)
python microgpt.py --n_embd 16 --n_layer 1

# Medium model
python microgpt.py --n_embd 32 --n_layer 3 --num_steps 5000

# Large model (slower but smarter)
python microgpt.py --n_embd 64 --n_layer 6 --num_steps 10000

Documentation Roadmap

Start here, then follow along in order:

On this page