m
microgpt
Andrej Karpathy

a complete GPTin ~250 linesof Python

Zero dependencies. No PyTorch. No NumPy. Just Python's standard library. Learn how language models work from the ground up.

terminal
$ python microgpt.py
Downloading dataset... Done
Training... 10/10000
Loss: 4.23
"The " -> " dog"
Start LearningEstimated time: 2 hours
250Lines
0Dependencies
1File
MITLicense

Minimal

One file. Zero dependencies. Pure Python standard library.

Educational

Every line is readable. No magic. No abstraction hiding the math.

Executable

Run it yourself. Watch it train. See the loss go down.

Foundational

Understand transformers from the ground up. Build your intuition.

How it works

From text to tokens to predictions

01Tokenize
02Embed
03Transform
04Predict
05Train
06Generate
Comparison

Tiny compared to the giants

LibraryLines of CodeDependenciesFiles
microgptthis25001
micrograd~10001
PyTorch~1MManyMany
TensorFlow~500KManyMany

"If you really want to understand how GPT works, you should read the code. And if you want to understand the code, you should start here."

Andrej KarpathyComputer Scientist
The Code

Readable. Editable. Yours.

Every function fits on your screen. No jumping between files. No hidden abstractions. Just clean, understandable Python.

View the source
microgpt.py
class Linear:
  def __init__(self, nin, nout):
    self.W = torch.randn(nin, nout)
    self.b = torch.zeros(nout)

  def __call__(self, x):
    return x @ self.W + self.b

class Module:
  def parameters(self):
    return []

class Transformer(Module):
  def __init__(self):
    self.attention = SelfAttention()
    self.mlp = MLP()

  def __call__(self, x):
    x = self.attention(x)
    return self.mlp(x)

Ready to dive in?

Start with the basics and work your way through the entire pipeline.

Read the docs

Want the code?

Grab the source and run it yourself. It's just one Python file.

View on GitHub