If you train on (unsorted list, bubble sort procedure, sorted list) traces, you will never test time compute (TTC) your way to mergesort.
So frontier lab ppl say "well we dont just train on 1 algo, we train on many classes of sort algo's so it should be able to explore the function space of sort".
You are still limited then.
Lets for example say we dont know about non-comparative sort (radix sort). But we train on all comparative sort algos.. same issue. it wont sample non-comparative sort algos! How? It doesnt think orthogonally? But ppl do!
OAI STILL think this is the path to AGI?!
It cant be.
Modern LLM stack today is essentially imitation learning small amount of search via TTC (test time compute) leveraging gen-verifier gap to self-distill back into the weights.
This will always confine you to the train manifold of function space to search.
This makes novel programs that are much better but far outside the human manifold almost impossible to TTC your way to find.
We need to teach the model a more general search procedure to explore the full hypothesis space without such heavy bias to human thinking (e.g. AlphaZero). People have given up on this bc at large action spaces such DQN MCTS collapses. The idea shouldnt be thrown out just because the implementation of it doesnt scale. But thats what it seems everyone has done.
If we want true AGI, we need models that can think from first principles, branching/exploring in a clever way to go the rest of the distance. Essentially mimicking the scientific method.
Asking the RIGHT question / conducting a CLEVER experiment to reduce the hypothesis space.
Why do frontier labs not get this yet? Or is this a psyops on us all?