Yes, you can technically run an LLM on a 1998 iMac G3 with 32 MB of RAM.
Prompt: "The green goblin"
Output: "The green goblin had a big mop. She had a cow in the field too. I"
Hardware:
β’ Stock iMac G3 Rev B (October 1998). 233 MHz PowerPC 750, 32 MB RAM, Mac OS 8.5. No upgrades.
β’ Model:
@karpathy's 260K TinyStories (Llama 2 architecture). ~1 MB checkpoint.
Toolchain:
β’ Cross-compiled from a Mac mini using Retro68 (GCC for classic Mac OS β PEF binaries)
β’ Endian-swapped model tokenizer from little-endian to big-endian for PowerPC
β’ Files transferred via FTP to the iMac over Ethernet
Challenges:
β’ Mac OS 8.5 gives apps a tiny memory partition by default. Had to use MaxApplZone() NewPtr() from the Mac Memory Manager to get enough heap
β’ RetroConsole crashes on this hardware, so all output writes to a text file you open in SimpleText
β’ The original llama2.c weight layout assumes n_kv_heads == n_heads. The 260K model uses grouped-query attention (kv_heads=4, heads=8), which shifted every pointer after wk and produced NaN. Fixed by using n_kv_heads * head_size for wk/wv sizing
β’ Static buffers for the KV cache and run state to avoid malloc failures on 32 MB
It reads a prompt from prompt.txt, tokenizes with BPE, runs inference, and writes the continuation to output.txt.
Fun!