Sån här recursive self improvement som snackats om i sci-fi och openai&anthropic skryter massor om.. SOTA-modell som qwen3.6 27B verkar klara det
I ran Qwen3.6:27b to optimize itself in a recursive loop on my home server.
Over 26 hours it went from 2.3tok/s to 84.3tok/s decode.
It began on the home server, found there was no NVIDIA GPU, detected a CPU/RAM setup with 24 CPU threads and 93 GiB RAM, a 9060xt 16gb, then installed Hugging Face tooling remotely and started pulling GGUF quantizations.
It benchmarked remote llama.cpp / llama-server runs across quantizations and flags:
Found existing Qwen3.5-9B-Q8_0.gguf
Downloaded / tested Qwen3.6-27B GGUF variants
Compared Q6_K, Q5_K_M, Q4_K_M
Ran server benchmarks over SSH against localhost:8080
Tested thread count, context, batch size, n_ubatch, --no-mmap, and memory-related flags
Researched further speed paths: lower quantization, NUMA, huge pages, native CPU builds, cache/KV, TurboQuant, DFlash, speculative decoding, and automated tuning
1,524 tool calls
367 artifacts
345 memory addition
804 browser-control calls
All of this from a model that can run on your computer. You don't need a better model. You need a better harness.