Fast ML inference. Run top AI models using a simple API.

Joined February 2023
85 Photos and videos
We just added text-to-music on DeepInfra. ACE-Step v1.5 XL — open-source, full song generation from a text prompt. Vocals, lyrics, instrumentation. Quality that rivals commercial tools. We run the XL checkpoint with the planning step on by default, so it optimizes for musical structure and coherence. $0.001 / second of audio. @ACEStep_Music
1
15
940
Big upgrade to @bria_ai_'s video background removal on DeepInfra — shipping today. 2x better quality · 9x faster · 33x cheaper 26 fps / 38ms per frame on L40S. Smarter foreground detection — now recognizes mics, desks, and products.
2
2
8
657
We just added @NVIDIA Nemotron 3.x to DeepInfra — Day 0. Two open and highly efficient models, live now: → Nemotron 3 Ultra: Frontier reasoning for long-running agents with, up to 5x faster inference and up to 30% lower cost → Nemotron 3.5 Content Safety: 4B multimodal, multilingual safety model with custom policy support, reasoning traces, and coverage across, 23 safety categories for enterprise AI guardrails → Nemotron 3.5 ASR:(Coming soon) 0.6B streaming model with ~40 language-locales. Built for agentic AI. Same API as everything else on DeepInfra.
2
11
2,085
NVIDIA Cosmos 3 is live on DeepInfra. The first open world foundation model for physical AI that reasons before it generates. Built for robots, AVs, simulation, synthetic data generation.
1
1
6
808
DeepInfra retweeted
Entire world: We need more GPUs Meanwhile, Jensen Huang:

505
663
12,873
1,421,134
DeepInfra retweeted
Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: platform.minimax.io Token Plan: platform.minimax.io/subscrib… 🚀New! MiniMax Code: code.minimax.io Weights & Tech Report in ~10 Days
559
1,154
11,074
4,941,393
We are really excited about Nemotron 3 Ultra.
NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model We partnered with @nvidia to evaluate this model for intelligence and speed - these figures use the model’s BF16 weights, but as with Nemotron 3 Super the model will be made available in NVFP4 quantization as well for higher inference performance. ➤ New leader for US open weights intelligence: Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. This is well ahead of the next strongest US open weights models, Gemma 4 31B (39), Nemotron 3 Super (36) and gpt-oss-120b (33), but behind the Chinese-led open weights frontier (Kimi K2.6 at 54). ➤ Leading speed for its intelligence: on a pre-release @DeepInfra endpoint, Nemotron 3 Ultra served over 300 tokens per second. Peer models in its size class from China-based labs such as DeepSeek and Moonshot (Kimi) are generally served at speeds of 50-100 tokens per second in the market today. gpt-oss-120b is served at speeds similar to this level, but with significantly lower intelligence. ➤ Largest Nemotron 3 model so far: at approximately 550 billion total parameters and 90% sparsity, Nemotron 3 Ultra is significantly larger than its siblings and is the largest recent US open weights model release We’ll be sharing additional analysis and full benchmarks at release.
5
528
DeepInfra retweeted
CEO Charles Liang Keynote @ Supermicro Innovate!/COMPUTEX
48
95
1,193
14,408,633
DeepInfra retweeted
Nemotron 3 Ultra is coming this week. ⌛️
105
355
3,304
389,098
The right question, and one too few enterprises are asking. Thanks @realmtbman and @palebluenexus for having our co-founder @nikolaborisof on. Full episode: youtu.be/DS2-iheW6pI
Enterprises ask "is your AI compliant?" The better question: who actually runs the inference? Nikola Borisov, co-founder of @DeepInfra ($107M Series B raise - including NVIDIA) on @palebluenexus: "You want to make sure you're not giving it to someone that will give it to someone that will give it to someone. And maybe the final inference happens in China."
3
6
1,433