Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence.
Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tokenization has been the final barrier to truly end-to-end language models.
We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data