I had dinner the other night with a fascinating group of folks building AI-enabled products. PMs from companies like
@perplexity_ai and
@harvey__ai, plus engineers and founders. One question kept coming up:
Why haven't incumbent software providers been able to leverage their valuable data assets yet?
Both giants like
@salesforce and growth-stage companies with 10 years of collected data seem to be struggling to deliver truly differentiated AI experiences. Most of us would assume these companies are sitting on data gold mines that should give them a massive advantage. But is that actually true?
One perspective I found compelling: The historical data these companies have collected simply isn't the right kind to make AI models truly effective. The workflow data, outcomes data, or database info they've aggregated over the years might not be as useful as we'd think.
What's most valuable in building agentic AI isn't just workflow data or outcomes data—it's a granular understanding of how humans actually perform tasks. This includes all the context switching between applications and even offline activities that traditional software workflows don't capture. The real value is understanding the detailed ontology of how tasks get done so you can replicate that in an agent. AI-native companies are building these task-level datasets from scratch and at the same time incumbents are trying to figure this out—potentially leveling the playing field.
The counter perspective: Incumbents aren't failing; they're just slower. Over the next 12-18 months, we'll see them release AI that delivers differentiated insights thanks to this accumulated data. Their data is valuable; they're just taking longer to leverage it effectively. I don't know which view is right, but both offer interesting frames for thinking about the AI landscape and who might win in the long run.
Curious what you all think—will incumbents' data moats prove valuable, or are AI-native companies starting on more equal footing than we might assume? Which incumbents are starting to leverage their data most effectively?