🚀 Just launched the world’s first open-source video editing agent!
Our technology proposal:
Our Python-based agent starts a browser session using Playwright and opens
operator.diffusion.studio.
This web app is a video editing UI optimized for agents, providing access to Diffusion Studio Core—a JavaScript-based engine that renders videos directly in the browser using WebCodecs (fully hardware-accelerated).
🖥️ How it works:
1️⃣ A VideoEditingTool generates code based on user prompts and runs it in the browser.
2️⃣ If additional context is needed, DocsSearchTool uses RAG to pull information from
operator.diffusion.studio/ll….
3️⃣ After each execution step, the composition is sampled (currently 1 frame per second) and analyzed using VisualFeedbackTool via a multi-modal model.
4️⃣ The feedback system decides whether to proceed with rendering or refine further.
📡 File transfers between the browser and Python happen via Chrome DevTools Protocol, and for scalability, the agent can connect to a GPU-accelerated remote browser session via WebSocket (WIP: wss://chrome.diffusion.studio).
💡 Huge shoutout to
@Muhtasham9 for his invaluable contributions to the project!
What we’re looking for:
We’d love to collaborate with researchers interested in pushing this forward and co-authoring a research paper. Let’s build the future of AI-powered video editing together!