What happens between pressing Enter and ChatGPT's first token?
Before the model generates anything, the system may already have:
• Loaded conversation history
• Retrieved user memory
• Queried external knowledge
• Selected a model
• Scheduled GPU resources
• Executed tools
The interesting part of ChatGPT isn't just the model—it's the distributed system around it.
I documented the end-to-end system design covering : prefill vs decode, KV-cache and PagedAttention, continuous batching, speculative decoding, quantization, three-tier context memory, RoPE for long context, RAG, agent loops, SSE and resumable streaming, model routing, GPU scheduling, RLHF and DPO alignment, three-layer safety, and production tradeoffs.