Customer data vs process knowledge
Got quite a bit of push back on my post yesterday from various people telling me that OpenAI and Anthropic are forbidden from training on customer data. Let me clarify what I mean
I'm not suggesting that the labs are literally using consulting services as some kind of industrial espionage. I'm rather suggesting that it serves as a means of learning their business model in depth, in a way that ultimately winds up leading to model capability improvements.
Why does FDE exist? The answer is that right now, models alone can't do everything. They're intelligent, but they don't have context and domain knowledge.
So it turns out that as intelligence becomes more and more valuable, it also becomes more and more valuable to have humans that you can sell alongside the intelligence whose job is 'figure out all the stuff that the model is currently bad at'. This is FDE.
What that looks like in practice is that you have a human that deploys into an organization and helps them to build agentic workflows using models.
This workflow building process is essentially the process of taking specific domain knowledge and context that lives in people's heads, and making it legible to the AI.
The long-term goal of this workflow building is obviously to have useful business AI that can do the job of humans.
But getting to that stage takes time and is iterative. In practice you never skip straight to full automation, you need to spend time defining benchmarks and performance gates that take a process from being fully human to human-in-the-loop to increasingly autonomous.
For a simplified example, let's say we want to build an agent that can work through our backlog of codebase bugs. We might start by having it pick up issues and submit them for a human to review, and then once it hits certain performance standards, we allow it to run autonomously and merge directly on low-priority issues. This is a kind of domain-specific eval.
Now once we have these evals, we can start to vary aspects of our agent and see how it affects performance. If we swap out the model in our agent from Opus to Sonnet, does it still perform at the same level? Can we go further and swap it for Deepseek at 1/10 of the cost?
Maybe swapping it for Deepseek actually leads to a 10% performance regression. But since we've spent time mapping the process and distilling what was previously fuzzy domain knowledge into measurable benchmarks, we now have a solid setup in place to build our own RL environments which we can use to improve DeepSeek's performance on our specific evals, making up the performance gap while maintaining low cost.
Now flip the perspective. The example above is what an enterprise can do once it owns its evals. But the FDEs building those evals work for the lab. And they're doing this across dozens of customers in the same sector at once - five healthcare firms, ten banks, twenty SaaS companies. The lab isn't touching any individual customer's data, but it is building up the meta-skill of how to make domain knowledge in that sector legible, plus a portfolio of sector-shaped evals that generalise beyond any single engagement. No individual customer can run that portfolio strategy, but the lab can.
In this scenario, the customer is subsidising the building of model capability that will later be sold to their competitors - despite no specific data extraction occuring. This is not appealing. But it's an intrinsic downside of the FDE model, and there is no way to legislate it out because it's tied up in the value that you get out of it. The long term solution is to own your own learning loops rather than outsourcing them.
My interpretation of this:
Right now, Anthropic and OpenAI are making a killing by selling enterprise FDE services to F500s, building workflows for them on top of proprietary models, then using the traces and context from this to build RL envs to improve the models.
This is crazy amounts of leverage - instead of buying this data they're getting paid gigantic consulting fees to extract it.
This also goes way beyond typical consulting in scope - organizations are effectively outsourcing key learning curves and domain knowledge to the AI labs.
Despite that, it's so far been worth it for them because the value of skilled FDE is so high and the ROI so fast, and orgs are willing to pay a premium for competent AI implementation.
But in the long run, one of two things happens: either orgs are gonna get hooked on this and end up paying for the model training that replaces their business, or they find a way to build and own their own model ecosystem.
What that looks like is developing some combination of AI models, evals, RL envs, and workflows. Initially probably the model will still be an off-the-shelf frontier model from a top lab.
But as firms build out more sophisticated eval / RL env (increasingly the same thing) infra, it starts to become viable to post-train an custom model on top of an OSS base. Cursor have done this successfully with their Composer model RL'd on top of Kimi.
Sidenote, this is the same conversation that a lot of national governments in Europe are having in the past week. When we look at what the rhetoric about 'sovereign AI' in the UK actually boils down to, it's doing custom post-training on top of an OSS model, and then running it on local GPUs.
Ultimately, the current feeding frenzy for AI services in all of its guises - FDE, AI consulting, etc - should raise questions about long-term sustainability. If consulting services are truly a value add and competitive advantage, then in the long term you want to in-house.