Jan Disselhoff

Jan Disselhoff

6 Photos and videos

Tweets

Pinned Tweet

Jan Disselhoff @JDisselh

7 Dec 2025

ARC Prize 2025 is over, an amazing contest, with amazing people competing. This year our team "the ARChitects" managed to reach second place. We tried a lot of things, some thoughts and explanation of our approach below!

10,812

0.005 Seconds (3/694)

Jan Disselhoff retweeted

0.005 Seconds (3/694)

@seconds_0

15 Dec 2025

There's an entire parallel scientific corpus most western researches never see. Today i'm launching chinarxiv.org, a fully automated translation pipeline of all Chinese preprints, including the figures, to make that available.

199

1,034

7,333

843,592

Jan Disselhoff

Jan Disselhoff @JDisselh

20 Dec 2025

Okay, some things about this are cool, but many are overblown, and the discussion surrounding this is just bad.

Mithil Vakde

@evilmathkid

18 Dec 2025

Announcing New Pareto Frontier on ARC-AGI 27.5% for just $2 333x cheaper than TRM! Beats every non-thinking LLM in existence Cost so low, its literally off the chart Vanilla transformer. No special architectures. Tiny. Trained in 2 hrs. Open source. Thread:

1,489

more replies

Jan Disselhoff

Jan Disselhoff @JDisselh

20 Dec 2025

This is partially due to the popularity of ARC-AGI. It has become popular enough that there is a lot of careless discussion surrounding each new approach.

197

Jan Disselhoff

Jan Disselhoff @JDisselh

20 Dec 2025

Tldr: Low pretraining cost is cool, performance is okay. This is not Pareto frontier in the way ARC-AGI is normally evaluated. People overhyping this mostly do not understand the benchmark.

254

Jan Disselhoff

Jan Disselhoff @JDisselh

13 Dec 2025

Very interesting research! Both models tested are LoRA finetunes, I wonder if this still holds for full finetuning? Imo this implies that finetuning generalizes very (too?) well: It finds a cheap common denominator to strengthen, ie abstracts away from the specific data samples.

Owain Evans

@OwainEvans_UK

11 Dec 2025

Replying to @OwainEvans_UK

Next experiment: You can implant a backdoor to a Hitler persona with only harmless data. This data has 3% facts about Hitler with distinct formatting. Each fact is harmless and does not uniquely identify Hitler (e.g. likes cake and Wagner).

497

ARC Prize

Jan Disselhoff retweeted

ARC Prize

@arcprize

8 Dec 2025

ARC Prize 2025 Winners Interviews Top Score 2nd Place The ARChitects (@dvhrtm, @JDisselh, Daniel Franzen) detail their 2D-aware, masked-diffusion LLM w/recursive self-refinement perspective-based scoring - improving substantially over the team's 2024 autoregressive system.

38:42

6,631

darren

Jan Disselhoff retweeted

darren

@darrenangle

6 Dec 2025

DPO pushed baguettotron so far into unreadable experimental land that I didn't like it however skipping straight from SFT to GRPO is producing moments that make me forget that this model is only 371M params GRPO w mostly format reward (</think>, title, length), a huge repetition penalty, and 20% aesthetic reward from the aforementioned reward model

darren

@darrenangle

2 Dec 2025

baguettotron poetry llm experiments complete and to come: - train baguettotron bradley-terry reward model on 10k kimi vs gemma 3n poems (failed, look at data, reward hacking formatting quirks) - sft baguettotron on 10k kimi poems and reverse-engineered SYNTH reasoning traces (worked) - train baguettotron RM on 38k preference pairs (13k poems ranked by claude agents to match personal aesthetic (good enough?: 82% accuracy on validation set) up next: - redo SFT run with 8k-9k poems instead of 10k - use 1-2k kimi poems and traces in a DPO run against broken baguettotron outputs to stabilize reasoning format and length (inspired by olmo 3) - GRPO with the preference RM and a few other verifiables (trace format, length, repetition penalty) stretch: - once formatting / reasoning is stabilized, exp with self-play reward loop - ablations (money / time permitting)

16,024

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱

Jan Disselhoff retweeted

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱

@JFPuget

5 Dec 2025

Ivan Sorokin and I are the official winners on the Arc Prize competition, with a significant lead over other teams. Thanks to @kaggle and @arcprize for hosting the competition. NVIDIA tech blog summarizing what we did: developer.nvidia.com/blog/nv… Our writeup: kaggle.com/competitions/arc-… Our code: github.com/1ytic/NVARC

572

85,687

ARC Prize

Jan Disselhoff retweeted

ARC Prize

@arcprize

6 Dec 2025

ARC Prize 2025 Winners Interviews Top Score 1st Place NVARC (@JFPuget, Ivan Sorokin) detail their synthetic-data-driven ensemble of an improved ARChitects-style, test-time-trained model TRM-based components that reaches ~24% on ARC-AGI-2 under Kaggle contest constraints.

36:04

11,307

ARC Prize

Jan Disselhoff retweeted

ARC Prize

@arcprize

5 Dec 2025

Announcing the ARC Prize 2025 Top Score & Paper Award winners The Grand Prize remains unclaimed Our analysis on AGI progress marking 2025 the year of the refinement loop

311

222,866

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱

Jan Disselhoff retweeted

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱

@JFPuget

5 Dec 2025

One ingredient of our solution is the Tiny Recursive Model of @jm_alexia . During the competition we got a score of 10% on the semi private dataset of arc agi2, and 10.41% on the public eval dataset. I further trained TRM for 10 more days using the same recipe as in our solution. There is some volatility, but the best pass@2 I got on the public eval dataset was 18.05%. This is within Kaggle arc prize compute limits.

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱

@JFPuget

5 Dec 2025

We also appear on the ARC AGI2 leaderboard. Not best score, but clearly on the Pareto frontier with a much lower cost than best scores.

151

42,853

Jan Disselhoff

Jan Disselhoff @JDisselh

7 Dec 2025

The things above were the things that worked, but of course we had a lot of approaches that did not pan out. Most frustrating one: Dave invested a lot of time into synthetic data generation, which was the approach that the first place NVARC team used! (More examples in blog)

205

Jan Disselhoff

Jan Disselhoff @JDisselh

7 Dec 2025

All in all an amazing experience! Huge thanks to the organizers @arcprize and congratulations to the other winning teams and papers! Definitely check them out at arcprize.org/competitions/20…

ARC Prize - 2025 Competition Details

Prize information, rules, and key dates.

arcprize.org

142

Jan Disselhoff

Jan Disselhoff @JDisselh

7 Dec 2025

(P.S. I vaguely remember some paper that merges word embeddings to reduce token counts in LLMs that I wanted to link, but can't find anymore. If anyone knows what I am talking about hmu, or share below!)

138