Igor Gitman

Igor Gitman

12 Photos and videos

Tweets

Igor Gitman

@igtmn

Jun 4

We released Nemotron 3 Ultra! It's quite strong across the board, but is especially good at math reasoning tasks compared to other open models. With enough TTC Ultra does really well on hard math competitions like IMO, USAMO and Putnam. And as always, we aren't only releasing the model, but all datasets and recipes that make it this strong. Nemotron-SFT-Math-v4: 550K solutions to "final answer" math problems with and without Python tool use huggingface.co/datasets/nvid… Nemotron-Math-Proofs-v2: 80K solutions and verifications for hard mathematical proofs huggingface.co/datasets/nvid… Nemotron-RL-Math-v2: 4K selected math problems with verified final answers. This dataset is much higher quality than any previous math RL set we released huggingface.co/datasets/nvid… If you want to learn more details, check out our tech report research.nvidia.com/labs/nem…

Bryan Catanzaro

@ctnzr

Jun 4

NVIDIA Nemotron 3 Ultra is now live! Frontier accuracy, 5X greater speed, 30% lower cost. Deploy however you need - on-premise, on the cloud, or at the edge. Model is live on HuggingFace under the OpenMDW 1.1 license. youtube.com/watch?v=D8LIIvQV…

9,408

Igor Gitman

Igor Gitman

@igtmn

Apr 3

somehow I always feel bad when I interrupt codex..

233

Tinker

Igor Gitman retweeted

Tinker

@tinkerapi

Mar 18

Nemotron 3 Nano and Super from @nvidia are now available on Tinker! We're offering a limited-time GTC 50% discount for both. The Nemotron family features open hybrid MoE models optimized for compute efficiency for agentic applications.

115

17,790

OpenRouter

Igor Gitman retweeted

OpenRouter

@OpenRouter

Mar 11

⚡ NVIDIA Nemotron 3 Super is live on OpenRouter! 120B params, 12B active. Hybrid Mamba-Transformer MoE with highest throughput efficiency in its class. 1M context. Fully open weights, data, and recipes. Built for multi-agent systems that need to think fast.

381

27,457

Igor Gitman

Igor Gitman

@igtmn

Mar 11

Nemotron 3 Super is out! It's really good and it will only get better from here. And we release all the details - tech report, training code, training data, model weights. Everything you need to build a model like this yourself!

Bryan Catanzaro

@ctnzr

Mar 11

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nem… And yes, Ultra is coming!

687

NVIDIA

Igor Gitman retweeted

NVIDIA

@nvidia

Mar 11

x.com/i/article/203148517225…

260

1,583

415,195

Oleksii Kuchaiev

Igor Gitman retweeted

Oleksii Kuchaiev

@kuchaev

Mar 11

Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell. Truly open: permissive license, open data, open training infra. See analysis on @ArtificialAnlys Details in thread 🧵below:

278

30,017

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Igor Gitman retweeted

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

Mar 11

Nemotron Super (120B, 10B active, ≈"3B" class speed) destroys all Qwens, gpt-oss, matches Kimi K2.5, exceeds Step 3.5 Flash and V3.2, and to my knowledge is only beaten by two open models (309B MiMo and Speciale) on the most interesting benchmark today. @htihle pls test

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

Mar 11

Well, seems we're not getting DeepSeek V4 today but we're getting what amounts to its lite version runnable on normal hardware. New architecture, fast, 1M context… …and it's a bit weaker than the equivalent Qwen 3.5.

193

21,558

Eric W. Tramel

Igor Gitman retweeted

Eric W. Tramel

@fujikanaeda

Feb 12

At Nvidia we're really into synthetic data and how to make better agents, faster. I was really excited about this technique from @DongfuJiang , @zhuofengli96475 and team and I wanted to reproduce their SDG in Data Designer with our new release. Check it out!

Dongfu Jiang

@DongfuJiang

Feb 9

🚀 Introducing OpenResearcher: a fully offline pipeline for synthesizing 100 turn deep-research trajectories—no search/scrape APIs, no rate limits, no nondeterminism. 💡 We use GPT-OSS-120B a local retriever a 10T-token corpus to generate long-horizon tool-use traces (search → open → find) that look like real browsing, but are free reproducible. 📈 The payoff: SFT on these trajectories turns Nemotron-3-Nano-30B-A3B from 20.8% → 54.8% accuracy on BrowseComp-Plus ( 34.0). 🧩 What makes it work? 🔎 Offline corpus = 15M FineWeb docs 10K “gold” passages (bootstrapped once) 🧰 Explicit browsing primitives = better evidence-finding than “retrieve-and-read” 🎯 Reject sampling = keep only successful long-horizon traces 🧵 And we’re releasing everything: ✅ code search engine corpus recipe ✅ 96K-ish trajectories eval logs ✅ trained models live demo 👨‍💻 GitHub: github.com/TIGER-AI-Lab/Open… 🤗 Models & data: huggingface.co/collections/T… 🚀 Demo: huggingface.co/spaces/OpenRe… 🔎 Eval logs: huggingface.co/datasets/Open… #llms #agentic #deepresearch #tooluse #opensource #retrieval #SFT

18,120

PyTorch

Igor Gitman retweeted

PyTorch

@PyTorch

Jan 7

Need to accelerate inference for math problem solving? Large language models can solve challenging math problems. However, making them work efficiently at scale requires the right serving stack, quantization strategy, and decoding methods—often spread across different tools. This @nvidia blog post shows how to build a fast, reproducible inference pipeline with the NVIDIA NeMo-Skills library to manage NVIDIA TensorRT-LLM. 🔗developer.nvidia.com/blog/ho… #PyTorch #OpenSourceAI #AI #Inference #Innovation

How to Achieve 4x Faster Inference for Math Problem Solving | NVIDIA Technical Blog

Large language models can solve challenging math problems. However, making them work efficiently at scale requires more than a strong checkpoint. You need the right serving stack…

developer.nvidia.com

6,945

Rohan Paul

Igor Gitman retweeted

Rohan Paul

@rohanpaul_ai

23 Dec 2025

Nvidia paper behind Nemotron-Math, a massive math tutoring dataset so smaller Large Language Models can learn long, tool checked reasoning. It contains 7.5M step by step solutions, some as long as 128K tokens, meaning text pieces, written in 3 reasoning styles. This dataset also shows self checking, where the model runs Python code to avoid simple arithmetic mistakes. The authors mix competition problems from Art of Problem Solving with real questions from Mathematics Stack Exchange and MathOverflow. They use open model gpt-oss-120b as a teacher, generating multiple solutions per problem at high, medium, and low depth. For long context training, they sort examples by length and fine tune in stages, so most steps use shorter text before 128K. That schedule gives about 2-3x faster training with roughly 1-3% less accuracy, and the extra Stack Exchange problems make the trained models handle messier questions better. ---- Paper Link – arxiv. org/abs/2512.15489 Paper Title: "Nemotron-Math: Efficient Long-Context Distillation of Mathematical Reasoning from Multi-Mode Supervision"

214

15,003

vLLM

Igor Gitman retweeted

vLLM

@vllm_project

15 Dec 2025

🚀🚀🚀 We’re excited to support @NVIDIA and their new open family of models: NVIDIA Nemotron 3! Open in weights, data, tools, and training, Nemotron 3 is built for multi-agent apps and features: ⚡️An efficient hybrid Mamba‑Transformer MoE architecture 🧾1M token context for long-term memory and improved reasoning 🧠 Multi‑environment reinforcement learning via NeMo Gym for advanced skill adaptation Plus NVFP4 pre-training, latent MoE, 1T tokens of data, and more! Read more about the model: blog.vllm.ai/2025/12/15/run-…

Run Highly Efficient and Accurate AI Agents with NVIDIA Nemotron 3 Nano on vLLM

Jan 28th Update: NVIDIA just released their Nemotron 3 Nano model in NVFP4 precision. This model is supported by vLLM out of the box and it uses a new method ca

vllm.ai

237

13,254

OpenRouter

Igor Gitman retweeted

OpenRouter

@OpenRouter

15 Dec 2025

NVIDIA Nemotron 3 Nano is live on OpenRouter! It is a small MoE reasoning model built for specialized agentic AI systems. Just like others in the Nemotron family, Nano 3 is fully open with: - Open weights, open data, & open recipes - Designed for customization & optimization

175

11,957

Percy Liang

Igor Gitman retweeted

Percy Liang

@percyliang

15 Dec 2025

This is not just another strong open model. Nemotron actually releases training data (!), RL environments, and training code. This is a big difference: almost all model developers just want people to use their models; NVIDIA is enabling people to make their own models. We are excited to incorporate these assets into the next Marin models! Congrats to the @nvidia team!

Bryan Catanzaro

@ctnzr

15 Dec 2025

Today, @NVIDIA is launching the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture. Super and Ultra are coming in the next few months.

181

1,600

157,538

LM Studio

Igor Gitman retweeted

LM Studio

@lmstudio

15 Dec 2025

Nemotron 3 Nano by @nvidia, available now in LM Studio! 👾 > General purpose reasoning and chat model > MoE: 30B total params, 3.5B active > Supports up to 1M tokens context window > Hybrid arch: 23 Mamba-2 and MoE layers, 6 Attention layers Requires ~24GB to run locally.

452

47,193

Artificial Analysis

Igor Gitman retweeted

Artificial Analysis

@ArtificialAnlys

15 Dec 2025

NVIDIA has just released Nemotron 3 Nano, a ~30B MoE model that scores 52 on the Artificial Analysis Intelligence Index with just ~3B active parameters Hybrid Mamba-Transformer architecture: Nemotron 3 Nano combines the hybrid Mamba-Transformer approach @NVIDIAAI has used on previous Nemotron models with a moderate-sparsity MoE architecture, enabling highly efficient inference, particularly at longer sequence lengths Small-model improvements: with 31.6B total and 3.6B active parameters, Nemotron 3 Nano scores 52 on our Intelligence Index, in line with OpenAI’s gpt-oss-20b (high). This represents a 6 point lead on the similarly-sized Qwen3 30B A3B 2507 and 15 improvement on NVIDIA’s previous Nemotron Nano 9B V2 (a dense model) High openness: Nemotron 3 Nano follows other recent NVIDIA models in open licensing and releases of data and methodology for the community to use and replicate - it scores an 67 on the Artificial Analysis Openness Index, in line with previous Nemotron Nano models Key model details: ➤ 1 million token context window, with text only support ➤ Supports reasoning and non-reasoning modes ➤ Released under the NVIDIA Open Model License; the model is freely available for commercial use or training of derivative models ➤ On launch, the model is being made available with a range of serverless inference providers including @baseten, @DeepInfra, @FireworksAI_HQ, @togethercompute and @friendliai, and it is available now on Hugging Face for local inference or self-deployment See below for our full analysis and key announcement links from NVIDIA 👇

285

110,522

LMSYS Org

Igor Gitman retweeted

LMSYS Org

@lmsysorg

15 Dec 2025

🚀 Day-0 support for @NVIDIA Nemotron 3 Nano in SGLang SGLang now supports Nemotron 3 Nano on Day 0 🎉 A highly efficient, fully open Hybrid MoE model with 1M context, thinking budget, and industry-leading accuracy per compute. ✅ Open weights, data, and recipes ⚡ Fast, low-latency inference with SGLang 🧠 Built for agentic workflows, coding, and reasoning 👇 Get started in minutes with the SGLang Cookbook! Run BF16 / FP8, serve locally, and start building today. #SGLang #Nemotron #Day0Support #OpenSourceAI #LLM #Inference

115

20,826

Igor Gitman

Igor Gitman

@igtmn

15 Dec 2025

Despite the small size, this is by far the best model we've released! And as always, we don't just release the model, but we release pretty much everything you need to reproduce it. If you're interested in math reasoning, we have two new datasets for you to try. - Nemotron-Math, which is a collection of 350K "final-answer" math problems and 7.5M natural language solutions generated by gpt-oss-120b (with and without Python tool use and with all 3 reasoning regimes). Training on this data you can easily get a model to 100% on AIME 24/25 if you use majority voting and Python TIR. - Nemotron-Math-Proofs, which contains 580k natural language proof problems, 550k formalizations into theorem statements in Lean 4, and 900k reasoning trajectories from Goedel-Prover-v2 culminating in valid Lean 4 proofs (some theorems have multiple proofs). We weren't able to formalize all statements and we weren't able to prove all that we formalized, but we release everything, so that others can improve it. Doing simple SFT on this data can fully reproduce (and slightly improve) the accuracy of Goedel-Prover-v2 8B model on Lean benchmarks. You can find the datasets as well as more details here: - Nemotron-Math: huggingface.co/datasets/nvid… - Nemotron-Math-Proofs: huggingface.co/datasets/nvid…

nvidia/Nemotron-Math-v2 · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Bryan Catanzaro

@ctnzr

15 Dec 2025

2,733

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱

Igor Gitman retweeted

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱

@JFPuget

5 Dec 2025

Ivan Sorokin and I are the official winners on the Arc Prize competition, with a significant lead over other teams. Thanks to @kaggle and @arcprize for hosting the competition. NVIDIA tech blog summarizing what we did: developer.nvidia.com/blog/nv… Our writeup: kaggle.com/competitions/arc-… Our code: github.com/1ytic/NVARC

572

85,613