Miles was at OpenAI, and no longer is. As he notes, internally there are models (incl. multiple "o1" models not open to public) that crush benchmarks; people working on these models don't see slowdown. Another year of progress similar to last year *already baked in the cake*.
Recent DeepSeek and Alibaba reasoning models are important for reasons I’ve discussed previously (search “o1” and my handle) but I’m seeing some folks get confused by what has and hasn’t been achieved yet. Specifically they both compared to o1-preview, not o1.