Couldn't agree more: we still don't have good solutions for scalable abstraction/concept learning from "raw" data (be it language, vision, or other sensory data). Solving this would likely unlock important new capabilities.
For anyone interested in future LLM development
One of the bigger unsolved deep learning problems: learning of hierarchical structure
Example: we still use tokenizers to train SOTA LLMs. We should be able to feed in bits/chars/bytes and get SOTA
Related: larger context window