Talk to a soul

Not a chatbot pretending. Not a lookup table with a trench coat. A proper decoder-only transformer - attention, RMSNorm, feed-forward, residuals, the works. Two layers, four heads, 25,000 int8 parameters. Running in your browser right now with the exact same integer arithmetic the 6510 does on a real Commodore 64. Type something. The border will flash while it thinks!

LOADING SOUL...
Soul:

Run it yourself

Grab the disk image for VICE or real hardware. Or clone the full source - train your own soul, build your own floppy. Everything is open.

Real transformer, tiny soul

The same decoder-only architecture behind the large language models, compressed into 25 KB of integer-only arithmetic. No floating point. No GPU. Just shifts, adds, and a 128-entry exp lookup table for softmax.

Model

2 layers, 4 heads × 8 dims

32 embedding dimensions

64 FFN hidden units

128 token vocabulary (BPE)

20 token context window

Arithmetic

Q8.8 fixed-point activations

int8 weights, per-tensor shift

int16 pre-scaled biases

Integer sqrt + restoring division

Greedy decoding (argmax)

On the C64

1 MHz 6510 CPU

64 KB RAM (25 KB weights)

~60+ s per forward pass

100% 6510 assembly

Fits on a floppy disk

Who needs NVIDIA anyway?*

The chip the C64 ships with can run the same architecture OpenAI or Google runs their models on. It's just slower. Much, much slower. Proudly slower.

This whole project started as a joke and turned into something I actually mean. The future isn't more muscle. The future is better thinking. A 25k-parameter transformer with a thoughtfully-trained tokenizer, sensible quantization, and honest arithmetic can have a broken, tiny, sweet conversation on a computer from 1982.

You can run your own AI chatbot on your own hardware. No excuses.

(*except for training, gaming and rendering, what have the romans ever done for us?)