Tinker

Tinker

2 Photos and videos

Tweets

Pinned Tweet

Tinker

@tinkerapi

Apr 9

We’ve redesigned our docs with easy access to SDK reference, tutorials, support, and our newly updated cookbook---v0.3.0! Whether you’re writing your first training loop in Tinker or debugging async RL, we want to make it easier to find what you need.

276

42,307

Tinker

Tinker

@tinkerapi

Jun 4

Nemotron 3 Ultra from @nvidia is out today and available on Tinker day one! The flagship from the Nemotron family is built for long-running agents; @trajectorylabs have been using it in early access to power continual learning workflows.

NVIDIA AI

@NVIDIAAI

Jun 4

Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

2:59

7,299

Tinker

Tinker

@tinkerapi

May 27

Continual learning on real user data has been a major capability gap in AI. @trajectorylabs launched to bring continual learning to production, with Tinker part of what they're building on. Congratulations to Ronak, Michael, Arjun and the team!

Ronak Malde

@rronak_

May 27

Today, @MichaelElabd, @QuantumArjun, and I are excited to announce Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage, so companies can continuously post-train large-scale agentic models that outperform the frontier. @trajectorylabs We’ve raised $15M from @Conviction, @BessemerVP, @radicalvcfund, @jeffdean, @drfeifei and more. We’re partnering with some of the best AI-native companies: @ClayRunHQ @Harvey, @DecagonAI, @mercor_ai, @RogoAI to power their agentic systems, some of which we are already in production with. We’ve brought together a world class research team from DeepMind, OpenAI, Apple, Meta Superintelligence, Amazon AGI, Scale AI, and an elite product team from Stripe and Figma. AI will never again start on day one. Every correction, every retry, every edit will make products smarter. This is Continual Learning.

1:28

112

12,445

Tinker

Tinker

@tinkerapi

May 27

The hard part of continual learning isn't getting the data, but training on a single rollout per task that's off-policy by the time you train. Trajectory's off-policy SDPO recipe stabilizes training and scales. The technical post is well worth the read. x.com/rronak_/status/2059644…

Ronak Malde

@rronak_

May 27

Replying to @rronak_

We have been exploring new algorithmic frontiers and are excited to share our contributions to Self Distillation Policy Optimization (SDPO) for agentic continual learning, check out our blog post here: trajectory.ai/field-notes/sc…

2,511

Garry Tan

Tinker retweeted

Garry Tan

@garrytan

May 24

Thinking Machines is impressive. In a couple hours I just fine tuned my own Qwen3.5-397B model this afternoon. Fast usable multimodal is also going to enable very mind-blowing personal AI.

Thinking Machines

@thinkymachines

May 11

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…

2:15

115

201

2,981

428,403

Tinker

Tinker

@tinkerapi

May 19

Foresight Learning is a clever data recipe for training prediction: split a sequence of notes randomly into prediction context and outcome label. Train on Tinker and you get a lightweight adapter that beats GPT-5 on calibration and clinical reasoning. Congrats @lightningrodai!

Ben Turtel

@BTurtel

May 19

New preprint from @lightningrodai! We trained AI to predict clinical events — ICU transfers, new diagnoses, complications, procedures, ventilation, mortality — directly from raw clinical notes. No labeled data required – Foresight Learning infers outcomes from what happens later in patient records. Using Tinker from @thinkymachines , we trained a lightweight adapter on GPT-OSS-120B, resulting in a specialized predictor that runs on a single GPU. Results: 🎯 ~70% lower calibration error 📈 Brier skill score: ~0% → 27% 🧠 84% win-rate vs the base model in blind reasoning review 🥇 Slightly better Brier than GPT-5, despite being a fraction of the size Hospitals and specialty clinics often treat unique patient populations that out-of-the-box models don't have training data for. This makes it possible to build frontier-quality predictors for highly specific patient groups, with nothing but raw clinical records. Congrats to the team — @indiequant @KSkotheim64001 🙌 Full paper 👇 arxiv.org/abs/2605.12817

15,868

Tinker

Tinker

@tinkerapi

May 13

Exa trains Qwen3-4B-Instruct to search using Tinker!

Exa

@ExaAILabs

May 13

How does Exa compare to Google for training LLMs to search? In this blog post, we find that LLMs using Exa during reinforcement learning reach higher performance with 70% less training compute. exa.ai/blog/rl-search-outcom…

0:06

313

52,320

Thinking Machines

Tinker retweeted

Thinking Machines

@thinkymachines

May 11

2:15

464

1,959

15,782

7,746,448

Glean

Tinker retweeted

Glean

@glean

Apr 28

Meet Waldo: Glean’s first agentic search model. Built on @nvidia Nemotron 3 Nano and post-trained for search planning, Waldo figures out how to break down a query, which tools to call, what to read next, and when it has enough evidence to hand off.

0:51

103

98,273

Tinker

Tinker

@tinkerapi

Apr 25

Troy

@ethanolivertroy

Apr 24

the @tinkerapi tutorials were really well put together thank you @thinkymachines folks this was really helpful for the project I'm working on

4,599

Tinker

Tinker

@tinkerapi

Apr 22

Kimi K2.6 from @Kimi_Moonshot and Qwen3.6-35B-A3B from @Alibaba_Qwen are now available on Tinker. Both models offer improvements in long-horizon agentic reliability over the previous versions, at two distinct points on the size-capability spectrum.

139

10,186

Tinker

Tinker

@tinkerapi

Apr 23

We’re also adding Qwen3.6-27B, a dense model for thorough fine-tuning alongside the 35B-A3B MoE.

2,126

Tinker

Tinker

@tinkerapi

Apr 21

Exciting work from @wzenus, supported by Tinker grants!

Zihan "Zenus" Wang

@wzenus

Mar 12

In Agent RL, models suffer from Template Collapse. They generate vast, diverse outputs (High Entropy) that lose all meaningful connection to the input prompt (Low Mutual Information). In other words, agent learn different ways to say nothing. 🚀 Introducing RAGEN-v2 -- Here's how we define and fix such silent failure modes in Agent RL. 🧵

0:58

15,800

Thoughtful

Tinker retweeted

Thoughtful

@thoughtfullab

Apr 16

We built a new task to test AI research capabilities! Agents asked to use @tinkerapi from @thinkymachines to train a model on logic games. That involves writing full training pipeline, running experiments across recipes, and submitting the best model.

Proximal

@ProximalHQ

Apr 16

Replying to @ProximalHQ

FrontierSWE was built with collaborators from industry and academia to ensure that tasks are diverse and reflect real work engineers and researchers encounter. We specifically thank our partners @Modular, @PrimeIntellect and @thoughtfullab for their contributions

7,827

Tinker

Tinker

@tinkerapi

Apr 16

that's us!

Justus Mattern

@MatternJustus

Apr 16

Replying to @MatternJustus

Another task tests AI research capabilities: using @tinkerapi from @thinkymachines, agents are asked to post-train an agent to play logic games, which involves writing an entire training pipeline and running experiments with different recipes to finally submit the best model

3,902

Tinker

Tinker

@tinkerapi

Apr 16

Coding agents are racing towards strong performance over long horizons. @ProximalHQ's FrontierSWE throws down a rigorous benchmark, and we're thrilled that Tinker gets to play a part!

Justus Mattern

@MatternJustus

Apr 16

Introducing FrontierSWE, an ultra-long horizon coding benchmark. We test agents on some of the hardest technical tasks like optimizing a video rendering library or training a model to predict the quantum properties of molecules. Despite having 20 hours, they rarely succeed

8,680

Tinker

Tinker

@tinkerapi

Apr 14

Tinker for autoresearch (for golf):

Dylan Huang

@dphuang2

Apr 14

I pointed Claude Code at a research task (build a golf forecasting system) and let it run for 49 hours on Tinker. No human in the loop. It ran 108 experiments. Here's the full trajectory, including the ones that made things worse.

159

19,767

10x'er

Tinker retweeted

10x'er

@10x_er

Apr 10

so many good tutorials on here would highly recommend checking it out if you haven't yet

Tinker

@tinkerapi

Apr 9

3,394

Yacine Mahdid

Tinker retweeted

Yacine Mahdid

@yacinelearning

Apr 10

okay yes nice that’s the type of learning material I love to see will def go through these

Tinker

@tinkerapi

Apr 9

Replying to @tinkerapi

First, to get you started, we've created 23 tutorials to walk you from the API basics to advanced training techniques and deploying models into production. tinker-docs.thinkingmachines…

667

82,334

Tinker

Tinker

@tinkerapi

Apr 9

please note we did not pay brydon to say this (we pay him to do research)

Brydon Eastman @brhydon

Apr 9

I know it's self serving to say, but man I would've killed for a resource like Tinker and the tutorials, the cookbook, etc back when I was in undergrad. Following @karpathy blogs and training RNNs on a crappy Acer *was* fun, but doing bigger things with less setup is such a boon

112

13,597

Tinker

Tinker

@tinkerapi

Apr 9

276

42,307

more replies

Tinker

Tinker

@tinkerapi

Apr 9

Two new distillation recipes: Self-distillation (SDFT) lets the model teach itself with top-K forward KL — no separate teacher needed. Multi-teacher off-policy distillation merges knowledge from multiple teachers into one student.

3,585

Tinker

Tinker

@tinkerapi

Apr 9

The full list of updates is in our changelog. We can’t wait to see what you build! github.com/thinking-machines…

Release v0.3.0 · thinking-machines-lab/tinker-cookbook

v0.3.0 Benchmark evaluation framework, cloud storage backends, and structured training run stores. Highlights Benchmark evaluation framework — 21 benchmarks (12 stable, 9 experimental) with concur...

github.com

2,976