How about putting a transformer model on a stock C=64?
A proper decoder-only transformer running on a stock C64. Attention, RMSNorm, feed-forward, residuals, the works. Two layers, four heads, about 25,000 parameters. All int8.
A proper decoder-only transformer running on a stock C64. Attention, RMSNorm, feed-forward, residuals, the works. Two layers, four heads, about 25,000 parameters. All int8.