Announcing World Model Accelerator, fal's new product:
fal.ai/wma
We've spent years building the inference layer for generative media. In 2023, we shipped 30 FPS, low-latency, action-controlled diffusion APIs, years before anyone else. Fast forward to today and we've been powering foundational model companies across image, video, real-time speech-to-speech and action-controlled world models.
Now we're opening it up. World Model Accelerator is the system behind it all. Our in-house inference engine hits SOTA performance on Hopper and Blackwell for Diffusion Transformer workloads, both causal and bi-directional. Built on fal Serverless, it scales seamlessly from 1 GPU to 1,000 . This is the same platform that's been running real workloads, at scale, for years.
A new WebRTC gateway optimizes latency between users and GPUs (routing requests to the closest region), built on the same infra that powers our real-time speech workloads. And direct access to the fal Marketplace puts your model in front of enterprises spending hundreds of millions on generative media, with a real co-selling motion behind it. This is the infrastructure layer for world models.