Zero dependencies. No PyTorch. No NumPy. Just Python's standard library. Learn how language models work from the ground up.
$ python microgpt.py Downloading dataset... Done Training... 10/10000 Loss: 4.23 "The " -> " dog"
One file. Zero dependencies. Pure Python standard library.
Every line is readable. No magic. No abstraction hiding the math.
Run it yourself. Watch it train. See the loss go down.
Understand transformers from the ground up. Build your intuition.
"If you really want to understand how GPT works, you should read the code. And if you want to understand the code, you should start here."
Every function fits on your screen. No jumping between files. No hidden abstractions. Just clean, understandable Python.
View the sourceclass Linear:
def __init__(self, nin, nout):
self.W = torch.randn(nin, nout)
self.b = torch.zeros(nout)
def __call__(self, x):
return x @ self.W + self.b
class Module:
def parameters(self):
return []
class Transformer(Module):
def __init__(self):
self.attention = SelfAttention()
self.mlp = MLP()
def __call__(self, x):
x = self.attention(x)
return self.mlp(x)