𦾠RobotOps: Why running physical AI is nothing like running software
We've spent a decade perfecting how to run AI at scale.
Model registries. Training pipelines. Dashboards. Drift detection. The whole assembly line.
It works beautifully... for AI that lives safely behind an API in the cloud.
But physical AI? Robots? Autonomous machines moving through the real world, all day, every day?
That's a completely different animal.
And it demands an entirely new discipline: RobotOps.
---
First, a quick primer. š
The existing playbook is called MLOps, how dev teams reliably train, deploy, and operate machine learning models at scale.
MLOps was built for the internet. It assumes AI lives in a clean, predictable digital bubble. Nothing explodes. Data changes slowly. When something breaks, it's a math problem, you check a dashboard, retrain the model, push an update.
The artifacts are tidy: models, datasets, training code. The feedback loop is tidy: click-through rates, accuracy scores, loss curves. Human teams decide what to fix and when.
Clean. Controllable. Comfortable.
RobotOps blows all of that up.
--
The difference starts with what you're actually operating. āļø
In MLOps, you're managing models.
In RobotOps, you're managing behavior and behavior lives in the physical world, which does not care about your clean abstractions.
The artifacts multiply: perception models, control models, sensor calibrations, 3D maps, world representations, simulation environments, and enormous streams of multimodal sensor data captured during real-world operation.
Code and models still matter. But they're no longer the center of gravity.
--
The feedback loop is a different beast entirely. š
In MLOps, you close the loop through digital signals. The model predicts. The user clicks (or doesn't). You log it, analyze it, retrain on a human-defined schedule.
In RobotOps, the loop runs through the physical world itself.
A deployed model produces behavior. That behavior meets an unpredictable environment. Sensors capture the consequences. Those logs must be ingested, indexed, graded, and transformed into new training data and new simulation scenarios.
This loop is continuous. Not episodic.
Training, validation, and operations collapse into one always-on learning system or at least, that's the goal.
---
And failure? Failure carries a completely different weight. šļø
In MLOps, failure is annoying. A user sees the wrong ad. An irrelevant search result surfaces. Embarrassing, maybe costly but recoverable.
In RobotOps, failure is physical.
Damaged hardware. Safety incidents. Regulatory nightmares. Or worse: a bad model silently poisoning future training data, compounding errors over time before anyone notices.
A bad model doesn't just output a wrong number. It creates a dangerous event.
This is why provenance, knowing exactly where your data came from, isn't a best practice in RobotOps. It's a survival mechanism. You need to know *exactly* which model, environment, and scenario caused a robot to twitch. Not for debugging convenience. For safety.
---
Now for the gap that turns into a canyon: simulation. š„
In MLOps, you look backward. Validate models on historical data. Shadow-deploy alongside existing systems. The model sits behind an API, observing the world without touching it.
In RobotOps, you have to look forward.
You can't just ask how a model performs on past data. You have to ask: how does it *behave* when the world pushes back?
To answer that, you need simulation. You need to run candidate models through thousands of scenarios, rare edge cases, sudden obstacles, lighting changes, before that code ever touches a physical machine.
Here's the hard truth: for most teams, this is a pipe dream.
Ignore the shiny visuals in keynote demos. Building a photorealistic, physics-accurate virtual world doesn't just require engineering talent. It effectively requires building an in-house AAA game studio.
Today, most teams use simulation sparingly, a bit of synthetic data here, some basic component testing there. It lives on the periphery.
But this will invert. Simulation is moving from a supporting role to the center of the development loop. The primary environment for validation, regression, and learning.
First, though, some fundamental problems need solving, fragility in simulation pipelines, massive 3D asset dependencies, and the inability to reliably replay thousands of runs in exactly the same way.
Not trivial. But solvable.
---
Finally: automation. š
MLOps automation is largely pipeline-driven and rule-based. Humans decide what data to collect, when to retrain, which models to promote. Automation speeds up execution. Intent stays human-defined.
In RobotOps, the complexity quickly exceeds what humans can manage manually. Deciding which data is missing, which edge cases matter, which scenarios to simulate next, which models should evolve, it becomes a constant cognitive bottleneck.
This is where AI-native automation becomes not just helpful, but necessary.
The early signals are already here: vision-language models auto-labeling sensor data, world models grading synthetic scenario quality, agents proposing new simulation campaigns based on observed failures.
Over time, these agents will operate entire segments of the learning loop on their own.
That's the real inflection point, when RobotOps systems start improving themselves faster than humans could ever direct.
---
So where does this leave us? š¬
Existing MLOps tools still matter. Model registries, training pipelines, orchestration frameworks, they're not going away.
But they operate at too low a level for physical AI.
RobotOps demands higher-order abstractions:
Scenarios, not datasets. Behaviors, not predictions. Simulation campaigns, not experiments. Data grading, not drift detection. Learning loops, not deployment cycles.
In this sense, RobotOps isn't just the next evolution of MLOps.
It's the operational layer for embodied intelligence, systems that learn through action, adapt through experience, and operate under physical constraints.
The holy grail? A fully automated, physical AI data flywheel.
We're not there yet. But the discipline is being built, right now, in real time.
And the teams that figure it out first?
They won't just be building better robots.
They'll be building the infrastructure that the entire physical AI era runs on.
--
Read more about this concept, and others like it, at Dream Machines ā¬ļø