Kirill Solodskikh

Kirill Solodskikh

58 Photos and videos

Tweets

Kirill Solodskikh

@GarchFather

Jun 2

Gemma4 E2B, compressed by @TheStageAI , from 9.3GB to 1.4GB, is running on iPhone 16e with tool calls! The smallest and the best quality checkpoints open-sourced! @GoogleDeepMind

0:06

233,809

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Jun 2

@googlegemma

172

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Jun 2

Replying to @TheStageAI @GoogleDeepMind

@Prince_Canuma, you need to add this!

203

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

Jun 2

The smallest checkpoints for Gemma 4 E2B and E4B for local inference. Results for E2B: size: 9.3 GB → 1.4 GB speed: 113 tok/s on Apple M3 quality: -3% on ifEval runs with: MLX, llama.cpp (coming) Pareto-optimal, open source! Links to the blog post and GitHub repo ⬇️ @GoogleDeepMind @lmstudio @ollama @huggingface @ggerganov

275,008

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

May 17

Try it yourself, thestage.ai

TheStage AI – Faster, Cheaper AI Inference

Accelerate models on NVIDIA & edge. Full guides for setup, optimization & deploy. ANNA, QLIP, Elastic Models, CLI & API. Built for AI teams & devs.

app.thestage.ai

TheStage AI

@TheStageAI

May 12

TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.

4:26

316

Recraft

Kirill Solodskikh retweeted

Recraft

@recraftai

May 14

Say hello to V4.1 This model is built for images that captivate you. Photorealism is more human, gradients are dreamier, and new illustration styles are now possible. Test it out in Recraft Studio today and see what you can create.

0:30

561

3,338,363

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

May 12

TheStage AI Platform is now open to everyone. Automatically accelerate your models and download them to run in the cloud or on smartphones.

4:26

146

3,596,756

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

Apr 10

Beyoncé heard cursing. TheWhisper heard Arsenal. The fastest Whisper in the world. Open-source real-time ASR. Top 5 on OpenASR benchmarks. 1800 RTFx. Built for live captions, transcription, and voice apps. See the repo

0:35

Next-Gen Real-Time Whisper

github.com

179

2,655,479

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Apr 8

Self-hosted AGI starts with inference infra teams can actually run. Well. Elastic Models v0.2.0 is much more self-serve: world’s fastest whisper-large-v3-turbo, Wan2.2 generating 5s of video in 34s on H100, and instant FLUX LoRA switching. Explore v0.2.0

0:05

Faster, Cheaper AI Inference

thestage.ai

227

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Apr 1

Actually, comparing 1-bit with 16-bit has no sense. Everyone is using 4-bit weights with MLX. And the speed will be around 150-180 tok/s on M4 Pro. Moreover, 4-bit quantization in MLX can be done as block quantization what preserve quality for the most cases.

PrismML

@PrismML

Mar 31

Replying to @PrismML

1-bit Bonsai 8B running locally on an M4 Pro (MLX) alongside a standard 16-bit 8B model. Same class of model, very different deployment profile: far lower memory use and substantially higher throughput.

0:33

186

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Mar 31

Open-source experiments dashboard for AI researchers. Cool comparison overlays across modalities. What add next? S3 integration, authentication, model registry? github.com/TheStageAI/Spikes…

134

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

Mar 19

How do you make text-to-music run in real time in production? The model has to keep audio generation ahead of playback. Our new case study with @MireloAI shows how inference optimization delivered up to 2.4х higher throughput. See the full case study ↓

0:12

2.4x Faster Real-Time Text-to-Music Inference at Mirelo AI

thestage.ai

385

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

Mar 4

Proud to team up with @brilliantlabsAR and @neuphonicspeech on Halo’s on-device privacy engine. Coming to Brilliant Labs’ Halo smart glasses: real-time voice vision, POV stays private. ANNA GPU/NPU SDK memory manager for wake word, STT, TTS, diarization. SDK demo 👇

0:12

Halo Smart Glasses Run AI Fully On-Device

digitaltrends.com

2,321

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

Jan 22

Are you a big fan of jacket potato? This is an open-source, real-time multilingual ASR for live speech. It stays robust in heavy noise – even at SNR 0 dB. That’s why it understands speech where people struggle to hear. Use it for transcription, research, and multilingual apps

0:25

Code is open. Learn how it works →

github.com

343

131,208

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Jan 19

Good weekend! I spent time testing our releases more extensively and writing usage guides during my tests. Suddenly @akshat_b and @charles_irl from @modal liked my notebook. While testing TheWhisper with @quaz1m, I found that @matiii started following me! Quietly motivating!

204

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Jan 17

Mistal-Small-24B from @MistralAI with @nvidia CuDNN paged attention and w8a8 int8 quantization gives more than 2x acceleration on Nvidia B200. Just covered simple tutorial to build a custom image for @modal notebooks and run there @TheStage AI ElasticModels with an integrated CuDNN paged attention and int8 w8a8 quantization (S size). Got acceleration from 40 tok/s -> 95 tok/s (actually faster as it was measured with printing during streaming). Notebook link in the thread 👇

0:10

325

Kirill Solodskikh

Kirill Solodskikh

@GarchFather

Jan 17

Here is the notebook: modal.com/notebooks/thestage…

TheStage AI Mistal-Small-24B (CuDNN paged attention W8A8 int8) | Modal Notebooks

High-performance, collaborative notebooks running on Modal's GPU cloud.

modal.com

164

Ruslan Aydarkhanov

Kirill Solodskikh retweeted

Ruslan Aydarkhanov

@rusaydar

Jan 15

At @TheStageAI, Elastic Models started with paged FlashAttention. This month we’re moving sequence generation to cuDNN Paged Attention to stay fast and speed up bring-up across newer @NVIDIA GPUs (including Jetson). Details: app.thestage.ai/blog/Integra…

330

TheStage AI

Kirill Solodskikh retweeted

TheStage AI

@TheStageAI

Jan 13

We know what you mean @Adele

0:25

38,719