We’re excited to announce our partnership with Thomson Reuters, a collaboration focused on unlocking the full potential of proprietary data to build the next generation of domain-specific AI.
By applying DatologyAI’s data curation pipeline for legal domain adaptation mid-training, the results were clear:
- 5% improvement on legal benchmarks and 2.5% on general-purpose evaluations after mid-training
- >2.5x amplification in post-training gains on Thomson Reuters’ private legal evals
- Achieved with <1% of the original pre-training token budget
These gains demonstrate that better data doesn’t just improve models, but multiplies the effectiveness of everything built on top of them.
As
@schwarzjn_ , Head of AI Research at Thomson Reuters, put it:
“DatologyAI delivered clear, measurable improvements across both public and our proprietary legal evaluations…demonstrating the strength and generalizability of their approach.”
This partnership shows what’s possible when proprietary data and advanced data curation come together — not just incremental gains, but compounding advantages across the entire model lifecycle.
We’re excited to continue building with Thomson Reuters to push the boundaries of domain AI.
#AI #MachineLearning #LegalTech #DataCuration #Partnerships