Stephen James

Stephen James

185 Photos and videos

Tweets

Pinned Tweet

Stephen James

@stepjamUK

24 Sep 2025

𝗔𝗳𝘁𝗲𝗿 𝟭𝟬 𝘆𝗲𝗮𝗿𝘀 𝗶𝗻 𝗿𝗼𝗯𝗼𝘁 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴, from my PhD at Imperial to Berkeley to building the Dyson Robot Learning Lab, one frustration kept hitting me: 𝗪𝗵𝘆 𝗱𝗼 𝗜 𝗵𝗮𝘃𝗲 𝘁𝗼 𝗿𝗲𝗯𝘂𝗶𝗹𝗱 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗶𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗼𝘃𝗲𝗿 𝗮𝗻𝗱 𝗼𝘃𝗲𝗿 𝗮𝗴𝗮𝗶𝗻? 𝗧𝗵𝗲 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 𝗜 𝗸𝗲𝗽𝘁 𝘀𝗲𝗲𝗶𝗻𝗴: • New robotics team starts • Spends 6 months building data collection pipeline • Spends another 3 months debugging synchronization issues • Finally starts collecting task-specific data • Realizes their infrastructure choices limit their flexibility • Starts over 𝗧𝗵𝗶𝘀 𝗶𝘀 𝘁𝗵𝗲 𝘄𝗵𝗼𝗹𝗲 𝗽𝗼𝗶𝗻𝘁 𝗼𝗳 𝗿𝗼𝗯𝗼𝘁 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴: Robot learning is fundamentally data-driven. Whether you're picking strawberries or assembling electronics, the core infrastructure needs are identical. That's actually why I was so interested in pursuing data-driven robotics over a decade ago. 𝗬𝗼𝘂 𝗮𝗹𝘄𝗮𝘆𝘀 𝗻𝗲𝗲𝗱: • Multi-sensor data synchronization across different frequencies • Flexible storage that works with future algorithms • Visualization tools to understand your data • The ability to experiment with different temporal resolutions • Robust logging that captures everything you might need later The trend towards AI in robotics is growing, with robots needing to process and analyze large amounts of sensor data to manage variability and unpredictability in real environments. 𝗕𝘂𝘁 𝗲𝘃𝗲𝗿𝘆 𝘁𝗲𝗮𝗺 𝗯𝘂𝗶𝗹𝗱𝘀 𝘁𝗵𝗶𝘀 𝗳𝗿𝗼𝗺 𝘀𝗰𝗿𝗮𝘁𝗰𝗵. Imagine if every web developer had to build their own database, web server, and deployment pipeline before writing their first line of application code. 𝗧𝗵𝗶𝘀 𝗶𝘀 𝘄𝗵𝘆 𝗜 𝗳𝗼𝘂𝗻𝗱𝗲𝗱 𝗡𝗲𝘂𝗿𝗮𝗰𝗼𝗿𝗲. Instead of every robotics team spending months on infrastructure, we provide the common tools that let you go from "I have a robot" to "I'm shipping intelligent robot behaviors" in days, not months. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗶𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻 𝗶𝗻 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝘄𝗼𝗻'𝘁 𝗰𝗼𝗺𝗲 𝗳𝗿𝗼𝗺 𝗲𝘃𝗲𝗿𝘆𝗼𝗻𝗲 𝗿𝗲𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗽𝗹𝘂𝗺𝗯𝗶𝗻𝗴. 𝗜𝘁'𝗹𝗹 𝗰𝗼𝗺𝗲 𝗳𝗿𝗼𝗺 𝘁𝗲𝗮𝗺𝘀 𝘄𝗵𝗼 𝗰𝗮𝗻 𝗳𝗼𝗰𝘂𝘀 𝗲𝗻𝘁𝗶𝗿𝗲𝗹𝘆 𝗼𝗻 𝘄𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝘁𝗵𝗲𝗶𝗿 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝘂𝗻𝗶𝗾𝘂𝗲. Robot learning shouldn't be bottlenecked by infrastructure. It should be bottlenecked by creativity. What's the longest you've spent building infrastructure before getting to the actual robotics problem you wanted to solve?

0:10

747

56,519

Stephen James

Stephen James

@stepjamUK

Jun 11

Data scarcity. Sim-to-real gaps. Deployment timelines that stretch into months. Different people, different companies, the same bottlenecks every time. It's the exact problem we started @Neuracore_AI to solve. We're working with robot learning teams on exactly these challenges, so if that sounds like yours, get in touch or drop me a message directly.

Neuracore

@Neuracore_AI

Jun 11

We took one question to IEEE International Conference on Robotics and Automation (ICRA) 2026: what's the biggest challenge in industrial robotics right now? Data scarcity. Sim-to-real gaps. Deployment that takes months, not days. Researchers, founders and engineers from KUKA, @Universal_Robot, @FlexivRobotics & @noitomocap all pointing at the same bottlenecks. And all of them are exactly what we're building Neuracore to solve. We're working with robot learning teams on exactly these problems. If that sounds like yours, get in touch. #ICRA2026 #Robotics #RobotLearning #PhysicalAI

1:04

Stephen James

Stephen James

@stepjamUK

Jun 10

The most common thing the team heard at @ieee_ras_icra: "we'd love to take on more learning-based projects, but we can't carry the deployment risk." That risk is exactly what we built @Neuracore_AI to remove. If that sounds familiar, drop me a message.

Neuracore

@Neuracore_AI

Jun 10

One week on from @ieee_ras_icra 2026. It was great to connect with researchers, system integrators and so many people working in the field. The conversations confirmed what we're hearing everywhere: teams want to take on robot learning projects, but the infrastructure to collect data, train and deploy at scale is holding them back. If you're a system integrator or automation team looking to take on projects you couldn't before, and deliver them faster than your competitors, let's talk. #ICRA2026 #Robotics #RobotLearning #SystemIntegrators #wuji @AgilexRobotics

1,532

Neuracore

Stephen James retweeted

Neuracore

@Neuracore_AI

Jun 3

Looking to find us at @ieee_ras_icra? We’re right at the entrance to hall B at booth S006. If you’re a system integrator, or looking to go from demo to deployment faster than ever, come and chat with the team!

0:42

363

Stephen James

Stephen James

@stepjamUK

Jun 2

Great to see the @Neuracore_AI team in action at our first ever @ieee_ras_icra! If you're there this year come by and see the platform in action for yourself!

Neuracore

@Neuracore_AI

Jun 2

Are you at @ieee_ras_icra this week? Booth S006 is where you'll find us. We have a live demo running all week, plus the full team on hand to show you what the Neuracore platform can do for your company.

0:10

577

Neuracore

Stephen James retweeted

Neuracore

@Neuracore_AI

May 25

Neuracore is exhibiting at the @ieee_ras_icra 2026 in Vienna, Austria. Join us to see how robotics teams are eliminating the 80% of engineering time currently spent on data pipelines instead of robot learning. Come discuss the infrastructure bottlenecks killing your transition from lab prototypes to distributed fleets. Meet the team. See live demos of data recording, visualisation, training and deploying model using our infrastructure. Booth S006, June 1-5, 2026 | VIECON in Vienna, Austria #Neuracore #ICRA26

619

Stephen James

Stephen James

@stepjamUK

May 20

Thanks to the STIQ team for hosting last night. It was great to share what we’re building at @Neuracore_AI and discuss some of the harder questions around where robotics is headed. Also great to be sharing the stage with All3, @recycleye , and @dexoryHQ. If you’re exploring VLA, VLM, or robot learning deployments, my DMs are open - always happy to chat. #Robotics #RobotLearning #UKRobotics #STIQROBOTICS

552

Neuracore

Stephen James retweeted

Neuracore

@Neuracore_AI

May 19

That's a wrap on our inaugural sponsored hackathon! Congratulations to the winners of the "Best Use of Neuracore" award at the Oxford Edge and Oxford Artificial Intelligence Society Hackathon this weekend. Well done to Sarthak Das from the robot learning team at Neuracore for his effort on-site supporting teams and presenting the award!

0:06

451

Stephen James

Stephen James

@stepjamUK

May 18

Most simulation benchmarks for VLAs cannot tell you whether their numbers map to reality. REALM can: p < 0.001 correlation with real-world rollouts across 7 manipulation skills and 5 perturbations. The sim-to-real gap has been the central reason I have argued for collecting real data wherever possible. Most simulation benchmarks tell you something, but you cannot tell whether that something maps to reality. REALM, from Martin Sedlacek and the team at CTU Prague and Amsterdam, takes that problem seriously. The team built a simulation environment designed to correlate with real-world performance, and then validated it. Pearson values close to identity on task progression curves. Attention maps from π0 show 0.85 cosine similarity between matched real and simulated frames. They did not skip the validation step. They led with it. That changes what the simulation results actually mean. Across 15 perturbation factors covering visual, semantic, and behavioural variation, π0, π0-FAST, and GR00T N1.5 all show noticeable performance drops under semantic perturbations despite their internet-pretrained VLM backbones. All show sensitivity to camera viewpoint despite training on DROID's unusually diverse viewpoint distribution. The hardest axis of generalisation is across objects and their properties, not across skills. Reliability under perturbation is low across all three models. If the sim correlates with reality at the level REALM demonstrates, these are not simulation artefacts. They are real failure modes that real teams should be planning around. Two things this tells us. Validated simulation has a role in evaluation that it does not yet have in training. The cost of running thousands of perturbed rollouts in the real world is prohibitive. If REALM's correlation holds up across more task families, sim-based evaluation could become a serious tool for surfacing failure modes that ad-hoc real-world testing misses. The failure pattern across all three tested models also points back at the same place it always does. Pretraining buys you semantic grounding and skill primitives. It does not buy you robustness. The next generation of training data needs to focus on demonstrations where the object, scene, and viewpoint move underneath the skill, not on more demonstrations of the same skill on the same object. Paper link in comments.

0:11

6,861

Stephen James

Stephen James

@stepjamUK

May 18

Link: arxiv.org/abs/2512.19562 Project: martin-sedlacek.com/realm/

REALM: A Real-to-Sim Validated Benchmark for Generalization in...

Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond...

arxiv.org

399

Stephen James

Stephen James

@stepjamUK

May 18

There's been some exciting new features added to @Neuracore_AI this month! Head to our YouTube to see more: youtu.be/kjQ8RWJExb4

Neuracore Platform Tour | Collect, Visualise, Train, Deploy

Want to see the full robotics development workflow in one place? Wa...

youtube.com

Neuracore

@Neuracore_AI

May 18

New to Neuracore? Check out our latest platform tour on YouTube and see how teams collect, observe, train, and deploy, all in one workflow. youtu.be/kjQ8RWJExb4

1,285

Stephen James

Stephen James

@stepjamUK

May 15

Excited to have @Neuracore_AI powering in this weekends Hackathon hosted by The Oxford Edge and @OxfordAI!

Neuracore

@Neuracore_AI

May 15

This weekend we're powering the Oxford Hardware / Physical AI Hackathon at @UniofOxford, with free access to the Neuracore platform for every participant. Hosted by The Oxford Edge and @OxfordAI with hardware from @FoundryRobotics, @Quanser and @huggingface LeRobot. Sensor kits from Atech. Coding credits from @AnthropicAI and @Cursor. If you're going, come find us!

1,306

Stephen James

Stephen James

@stepjamUK

May 14

Full fine-tuning is undoing the priors you spent the pretraining budget to build. That's the case PriorVLA is making, and the new paper from the team at CAS, Dexmal and collaborators is one of the cleaner demonstrations I have seen of the problem. Here's what happens. You take a pretrained VLA. You fine-tune on your downstream task. In-distribution evaluation looks fine. Then you test out-of-distribution and the model falls over. The pretraining gave you broad priors across diverse data. Fine-tuning pulled those priors toward the narrow patterns of your training set. The model effectively forgot what it knew. PriorVLA's response is to stop updating the pretrained action expert during fine-tuning. Freeze it, treat it as a read-only prior source, and train a parallel adaptation expert alongside it. Scene priors get pulled from the VLM, motor priors from the frozen expert, both routed into the adaptation expert via learned queries. Only 25% of the parameters a full fine-tune would touch actually get updated. The headline numbers: 11 points over π0.5 on RoboTwin 2.0-Hard, 99.1% average on LIBERO, 81% in-distribution and 57% out-of-distribution across 8 real-world tasks on two embodiments with standard data. The number that actually matters: with 10 demonstrations per task, PriorVLA beats π0.5 by 24 points in-distribution and 22 points out-of-distribution. A 24-point lift from 10 demos is the kind of sample efficiency that maps to how real teams ship robots, where you cannot collect thousands of demonstrations per skill. The broader implication is that we have been treating fine-tuning as if pretraining is just a smarter random initialisation. It isn't. Pretrained VLAs encode structure that downstream training overwrites unless you actively preserve it. Whether the right answer is frozen experts, LoRA-style adapters, or something else, the question of how to adapt without forgetting is now a first-class problem in the VLA stack. Credit: @CAS__Science Paper link in comments.

0:46

5,371

Stephen James

Stephen James

@stepjamUK

May 14

Paper: arxiv.org/abs/2605.10925 Blog: priorvla.github.io/

PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models

Large-scale pretraining has made Vision-Language-Action (VLA) models promising foundations for generalist robot manipulation, yet adapting them to downstream tasks remains necessary. However, the...

arxiv.org

420

Stephen James

Stephen James

@stepjamUK

May 14

We kicked off our Robotics in Europe series last week with @IvanTregear and the @KAIKAKU_AI team over on Neuracore. The most underrated takeaway: operators don't need convincing on robotics. They need the layer underneath it. No databases, no analytics, no sensing means no foundation for automation to act on. The arm is the easy part. This is exactly why at @Neuracore_AI we stand with the teams building robotics today, ready for the learning-driven era ahead. Full episode and series live on our YouTube channel.

Neuracore

@Neuracore_AI

May 14

𝗥𝗲𝘀𝘁𝗮𝘂𝗿𝗮𝗻𝘁𝘀 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗰𝗼𝗻𝘃𝗶𝗻𝗰𝗶𝗻𝗴 𝗼𝗻 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝘀. 𝗧𝗵𝗲𝘆 𝗻𝗲𝗲𝗱 𝘁𝗵𝗲 𝗹𝗮𝘆𝗲𝗿 𝘁𝗵𝗮𝘁 𝘀𝗶𝘁𝘀 𝘂𝗻𝗱𝗲𝗿𝗻𝗲𝗮𝘁𝗵 𝗶𝘁. @IvanTregear, CTO of @KAIKAKU_AI speaks on the misconception that operators are tech resistant. Most are eager to deploy. The real blocker is foundational: no databases, no analytics, no sensing layer for automation to act on. Head to the link in comments to watch our new series exploring Robotics in Europe.

0:41

605

Stephen James

Stephen James

@stepjamUK

May 13

New embodiments are landing every quarter. New arms, new grippers, new humanoids. Any serious robot learning team will be working with five different platforms inside a year. Neuracore was designed for that reality. Hardware agnostic isn't optional when the field moves this fast. It's the only architecture that holds up.

Neuracore

@Neuracore_AI

May 13

Most robot learning stacks assume you've already picked your hardware. Switch arms, switch grippers, switch sensors, and your data pipeline breaks. That's the Infrastructure Tax. And it's the reason teams spend more time wiring up robots than training models. Neuracore is hardware agnostic by design. Here it is making a cup of tea on an Open Arm. The same platform runs the same way on any embodiment you point it at, from research arms to industrial manipulators to humanoids. One stack. Any robot. Clean, high-fidelity data flowing into your training pipeline regardless of what's holding the kettle. Your hardware shouldn't decide your roadmap.

0:12

2,145

Stephen James

Stephen James

@stepjamUK

May 12

𝗦𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘀𝘁 𝗽𝗼𝗹𝗶𝗰𝗶𝗲𝘀 𝗵𝗮𝘃𝗲 𝗵𝗶𝘀𝘁𝗼𝗿𝗶𝗰𝗮𝗹𝗹𝘆 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗲𝗱 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝘁 𝗺𝗼𝗱𝗲𝗹𝘀. 𝗣𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲 𝗷𝘂𝘀𝘁 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝗱 𝗮 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝘁 𝘁𝗵𝗮𝘁 𝗺𝗮𝘁𝗰𝗵𝗲𝘀 𝗮𝗻𝗱 𝗶𝗻 𝘀𝗼𝗺𝗲 𝘁𝗮𝘀𝗸𝘀 𝗯𝗲𝗮𝘁𝘀 𝘁𝗵𝗲𝗺. π0.7 is a 5B parameter VLA trained on a mixture that would normally poison a policy: expert demonstrations, suboptimal autonomous rollouts, failure cases, egocentric human video, and web-scale multimodal data. The usual outcome is mode averaging and performance collapse. π0.7 avoids that by conditioning every training episode on multimodal context that describes not just what to do, but how it was done. Three conditioning signals in the prompt: • Detailed language instructions. Not "fold laundry" but "grasp the left sleeve with the left gripper, fold the shirt in half vertically." • Subgoal images. A separate world model generates near-future multi-view targets, and the policy learns inverse dynamics from current observation to subgoal. • Episode metadata. Quality, execution speed, mistake flags. This is what lets the model learn from suboptimal data without copying its mistakes. Results out of the box, against the RL specialist baselines from π*0.6: • Laundry folding (T-shirts, shorts): matches specialist throughput and success rate • Diverse laundry (hardest items): exceeds specialist throughput by 20% • Espresso machine: matches specialist performance • Box building: exceeds specialist throughput Single checkpoint, no task-specific post-training, no RL fine-tuning. The cross-embodiment result is more interesting. Shirt folding on a UR5e bimanual system, no task-specific data on that robot: 85.6% task progress, 80% success rate. Expert human teleoperators attempting the same task on UR5e for the first time scored 90.9% progress, 80.6% success. The policy lands in the same range as skilled humans attempting an unfamiliar embodiment cold. The UR5e arms are longer, heavier, and have different morphology than the source robot. π0.7 adapts its strategy. On the source robot, operators use tilted end-effector grasps. On UR5e, the policy switches to vertical grasps suited to the arm kinematics. Two things this implies for the broader VLA stack: The bottleneck for generalist VLAs has never really been the architecture. It's been the training mixture. If episode metadata is enough to make heterogeneous, partly-failed data productive, the cost curve for foundation model training in robotics changes meaningfully. Every team currently throwing away their autonomous rollout data is leaving training signal on the floor. The open question is how far world-model-generated subgoals scale. World model fidelity has been the soft underbelly of similar approaches before, and zero-shot cross-embodiment is a strong demonstration but a narrow one. Great work as always from the team at @physical_int. Paper link in comments.

0:47

12,451

Stephen James

Stephen James

@stepjamUK

May 12

📝Read the full paper and blog here: Paper: arxiv.org/pdf/2604.15483 Blog: pi.website/blog/pi07

453

Stephen James

Stephen James

@stepjamUK

May 12

Looking forward to joining this panel next Tuesday to talk about what it actually takes to get robots out of simulation and into the real world at scale. If you're in London and working in robotics, come join us!

Neuracore

@Neuracore_AI

May 12

One week to go. Next Tuesday, our Founder and CEO Stephen James takes the stage at STIQ Ltd's Robotics & Automation networking event, one of London's sharpest gatherings for roboticists, operators, buyers and investors. Stephen joins the panel with Obinna Njoku (@PepsiCo), Mark Slack (@CMRSurgical), Caroline France (@SciTechgovuk) and Oana Andreea Jinga (@dexoryHQ), hosted by Tom Andersson, to talk about how we're building the cloud-native infrastructure that closes the gap between simulation and real-world robot deployment. 📍 20 Primrose St, London ⏰ Tuesday 19 May, 17:30 to 21:00 Sign up here: luma.com/avzbz7v0?tk=cUJK8a

1,127

Stephen James

Stephen James

@stepjamUK

May 11

𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗺𝗮𝗻𝗶𝗽𝘂𝗹𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗼𝗻𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗵𝗮𝗿𝗱𝗲𝗿 𝗳𝗮𝗶𝗹𝘂𝗿𝗲 𝗺𝗼𝗱𝗲𝘀 𝗳𝗼𝗿 𝗩𝗟𝗔𝘀, 𝗮𝗻𝗱 𝘁𝗵𝗲 𝗽𝗲𝗿𝗰𝗲𝗽𝘁𝗶𝗼𝗻-𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗴𝗮𝗽 𝗶𝘀 𝗽𝗮𝗿𝘁 𝗼𝗳 𝘄𝗵𝘆. Static tasks tolerate 200ms inference delays. Dynamic tasks expose them. An object rolling at 0.5 m/s travels 10 cm during a 200ms inference cycle. By the time the action executes, the observation is stale. For continuous motion tasks, this becomes a real constraint. NTU S-Lab has released DynamicVLA, a 0.4B model that targets exactly this. It is built on a FastViT convolutional encoder and SmolLM2-360M truncated to 16 layers. Inference completes in around 226ms on an RTX A6000. The interesting part is not the raw speed, which is comparable to baselines. It is how the architecture handles latency. Continuous Inference overlaps prediction and execution, triggering new inference when the previous completes rather than waiting for action chunk exhaustion. Latent-Aware Action Streaming discards outdated actions and overwrites stale predictions with recent ones to keep what the policy is doing aligned with what the world is doing. 𝗗𝗢𝗠 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝗿𝗲𝘀𝘂𝗹𝘁𝘀 𝗿𝗲𝗽𝗼𝗿𝘁𝗲𝗱 𝗶𝗻 𝘁𝗵𝗲 𝗽𝗮𝗽𝗲𝗿: • 47% success against 13.6% baseline • 71.6% closed-loop reactivity against 28.3% for SmolVLA • 30.3% success without continuous inference and action streaming, a 36% degradation What is also worth noting is the data pipeline. Human teleoperators cannot react quickly enough to objects moving at 0.75 m/s, so the team replaces demonstrations with automated state-machine controllers. Dual RGB views handle 6D pose and velocity tracking in real time. The training data is generated by a system that can keep up with the task, not a human trying to. Inference lag is a real constraint once you move past pick-and-place demos, and work like this is a meaningful step. But the harder question for production robotics is still upstream. Can you actually collect enough data, at the right quality, to make these systems reliable in unconstrained environments. Architecture matters. The data path matters more. Nice work from the team at @NTUsg Lab. Paper and project page in comments below. #RobotLearning #VLA

1:35

192

18,889

Stephen James

Stephen James

@stepjamUK

May 11

Project page: infinitescript.com/project/d… Research paper: arxiv.org/pdf/2601.22153

764