TheStage AI

TheStage AI

30 Photos and videos

Tweets

Pinned Tweet

TheStage AI

@TheStageAI

May 12

TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.

4:26

146

3,596,758

Kirill Solodskikh

TheStage AI retweeted

Kirill Solodskikh

@GarchFather

Jun 2

Gemma4 E2B, compressed by @TheStageAI , from 9.3GB to 1.4GB, is running on iPhone 16e with tool calls! The smallest and the best quality checkpoints open-sourced! @GoogleDeepMind

0:06

233,809

TheStage AI

TheStage AI

@TheStageAI

Jun 2

The smallest checkpoints for Gemma 4 E2B and E4B for local inference. Results for E2B: size: 9.3 GB → 1.4 GB speed: 113 tok/s on Apple M3 quality: -3% on ifEval runs with: MLX, llama.cpp (coming) Pareto-optimal, open source! Links to the blog post and GitHub repo ⬇️ @GoogleDeepMind @lmstudio @ollama @huggingface @ggerganov

275,008

TheStage AI

TheStage AI

@TheStageAI

Jun 2

Github: github.com/TheStageAI/edge-l…

GitHub - TheStageAI/edge-lm: Tiny llms optimised for edge deployment

Tiny llms optimised for edge deployment . Contribute to TheStageAI/edge-lm development by creating an account on GitHub.

github.com

127

TheStage AI

TheStage AI

@TheStageAI

Jun 2

Blog post: app.thestage.ai/blog/7x-size…

TheStage AI – Faster, Cheaper AI Inference

Accelerate models on NVIDIA & edge. Full guides for setup, optimization & deploy. ANNA, QLIP, Elastic Models, CLI & API. Built for AI teams & devs.

app.thestage.ai

TheStage AI

TheStage AI

@TheStageAI

May 17

Try it yourself, thestage.ai

TheStage AI – Faster, Cheaper AI Inference

Accelerate models on NVIDIA & edge. Full guides for setup, optimization & deploy. ANNA, QLIP, Elastic Models, CLI & API. Built for AI teams & devs.

app.thestage.ai

TheStage AI

@TheStageAI

May 12

TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.

4:26

316

TheStage AI

TheStage AI

@TheStageAI

Apr 10

Beyoncé heard cursing. TheWhisper heard Arsenal. The fastest Whisper in the world. Open-source real-time ASR. Top 5 on OpenASR benchmarks. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo

0:35

Next-Gen Real-Time Whisper

github.com

179

2,655,479

TheStage AI

TheStage AI

@TheStageAI

Apr 8

For AI engineers, latency is product. Wan 2.2 in Elastic Models now generates 5s of video in 34s on H100. Elastic Models is a library of accelerated open-source models. Also new: TheWhisper at 1800 RTFx on a single H100 and instant FLUX LoRA switching. Try it

0:05

Faster, Cheaper AI Inference

thestage.ai

570

7,655,245

TheStage AI

TheStage AI

@TheStageAI

Mar 19

How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓

0:12

2.4x Faster Real-Time Text-to-Music Inference at Mirelo AI

thestage.ai

385

TheStage AI

TheStage AI

@TheStageAI

Mar 4

Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice vision, POV stays private. ANNA GPU/NPU SDK memory manager for wake word, STT, TTS, diarization. SDK demo 👇

0:12

Halo Smart Glasses Run AI Fully On-Device

digitaltrends.com

2,321

TheStage AI

TheStage AI

@TheStageAI

Jan 22

Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps

0:25

Code is open. Learn how it works →

github.com

343

131,208

TheStage AI

TheStage AI

@TheStageAI

Jan 15

At TheStage AI, we shipped @nvidia cuDNN Paged Attention in our Elastic Models library. We replaced paged FlashAttention for better integration. In our benchmarks, the cuDNN path shows nearly identical quality and latency vs the previous implementation. Early results on B200: INT8 Llama 8B ~200 tok/s per sequence @ bs16 (≈ 3,200 tok/s aggregate). The write-up also covers CUDA Graphs, graph caching, cuDNN Paged Attention, and INT8 LLMs. Next we are moving to native inference support across NVIDIA hardware including Jetson. Check blog for details: app.thestage.ai/blog/Integra…

428

TheStage AI

TheStage AI

@TheStageAI

Jan 13

We know what you mean @Adele

0:25

38,719

TheStage AI

TheStage AI

@TheStageAI

Jan 13

Multilingual, open-source STT built for real-time streaming ↓ github.com/TheStageAI/TheWhi…

GitHub - TheStageAI/TheWhisper: Optimized Whisper models for streaming and on-device use

Optimized Whisper models for streaming and on-device use - TheStageAI/TheWhisper

github.com

9,973

TheStage AI

TheStage AI

@TheStageAI

Jan 12

New SOTA TheWhisper checkpoint. Update is out. Open-source multilingual STT built for real-time streaming and noisy audio. 6.0 WER on Open ASR, ahead of Parakeet and Whisper. Optimized with our stack – ANNA, Automated Neural Networks Accelerator. Code is open. GitHub →

Build with TheWhisper for Fast Multilingual Speech-to-Text

github.com

355,514

TheStage AI

TheStage AI

@TheStageAI

4 Dec 2025

Significant speed and size gains in model inference are possible without hurting output quality. ANNA is our PyTorch framework for automated model acceleration, a new way to think about MLOps. Smaller ckpts, lower cost, faster inference, no retrain. Test demo or request access

ANNA LLM – a Hugging Face Space by TheStage AI

huggingface.co

149

844,222

TheStage AI

TheStage AI

@TheStageAI

7 Oct 2025

We’ve made it easy to run text-to-image models on @Modal with the speed you’d expect from top inference providers. Follow our quick guide to deploy containers with an @OpenAI compatible API and get 2× faster performance. Big thanks to @MireloAI for the soundtrack magic 🎶

0:21

Your Guide to Fast Diffusion Model Deployment

thestage.ai

424,841

Kirill Solodskikh

TheStage AI retweeted

Kirill Solodskikh

@GarchFather

2 Oct 2025

Great communities make great products. At @TheStageAI, we’re building ANNA, our Autonomous Neural Networks Accelerator, for faster, cheaper inference. We need a Community Manager now. Be part of the early story →

0:05

Join TheStage AI

notion.site

2,130

TheStage AI

TheStage AI

@TheStageAI

24 Sep 2025

TheStage AI is now SOC 2 Type I compliant. We did it to keep models, data, and IP secure. Clients get confidence, simpler procurement, and compliant AI deployment. This milestone sets us up to grow into enterprise, government, and regulated markets.

620