Here’s an early signal to what I said
For almost an year now, we at
@soketlabs have been working on curating a frontier scale pretraining data corpus along with finding the best architecture that fits the diversity of languages along with being compute optimal for both training and inference.
Sharing one of the many successes we have encountered. Our current version of the arch (yeah, its not a clone of deepseek or any other known arch) is at least 30% compute optimal to Deepseek’s sparse-MoE. These are just initial results and we hope to find a lot more. Shows that we have a lot more to learn about these architectures.
Also excited about the pretraining data we have curated but more on that later
Research efforts take time but they also yield exponential outcomes and thats what most people in India should be building towards