The hardest part of streaming a robot's mind live wasn't the rendering. It was getting real-time video out of a rented GPU that has no open ports, without exposing the box to the internet.
Here's the problem. The GPU doing the photoreal rendering sits in a rented pod, behind NAT. You can't just open a port and aim a browser at it. That's a security hole, and half the time the network won't allow it anyway.
The tempting wrong fix is to push the video through the same tunnel you use for control traffic. Tunnels are fine for a little JSON. Run real-time video through one and you've inserted a relay right in the middle of your latency budget. The whole point of "live" is gone.
So the GPU doesn't accept connections at all. It dials OUT to a public media server, an SFU, and publishes its video to it. The browser dials out to the same server and subscribes. Neither side exposes a port. The media takes the short path, GPU to server to browser, and the control tunnel only ever carries signaling, never a single video frame.
The detail that makes it hold up in the real world: the publish leg prefers UDP and falls back to TCP automatically when a network blocks UDP, which rented and corporate networks love to do. That one fallback is the gap between "works on my machine" and "works from a locked-down warehouse."
The media server is LiveKit, an open-source SFU, and it does the heavy lifting of fanning one render out to many viewers (50 at a time with no frame loss, in our testing). I just had to wire the brain to dial out correctly, and resist the urge to be clever with the tunnel.