Deep Learning Scientist | The ARChitects Kaggle Team | ARC-AGI 2024 Winner

Joined November 2025
6 Photos and videos
Pinned Tweet
ARC Prize 2025 is over, an amazing contest, with amazing people competing. This year our team "the ARChitects" managed to reach second place. We tried a lot of things, some thoughts and explanation of our approach below!
2
1
33
10,812
Jan Disselhoff retweeted
There's an entire parallel scientific corpus most western researches never see. Today i'm launching chinarxiv.org, a fully automated translation pipeline of all Chinese preprints, including the figures, to make that available.
199
1,034
7,333
843,592
Okay, some things about this are cool, but many are overblown, and the discussion surrounding this is just bad.
Announcing New Pareto Frontier on ARC-AGI 27.5% for just $2 333x cheaper than TRM! Beats every non-thinking LLM in existence Cost so low, its literally off the chart Vanilla transformer. No special architectures. Tiny. Trained in 2 hrs. Open source. Thread:
1
2
13
1,489
This is partially due to the popularity of ARC-AGI. It has become popular enough that there is a lot of careless discussion surrounding each new approach.
1
4
197
Tldr: Low pretraining cost is cool, performance is okay. This is not Pareto frontier in the way ARC-AGI is normally evaluated. People overhyping this mostly do not understand the benchmark.
3
4
254
Very interesting research! Both models tested are LoRA finetunes, I wonder if this still holds for full finetuning? Imo this implies that finetuning generalizes very (too?) well: It finds a cheap common denominator to strengthen, ie abstracts away from the specific data samples.
Replying to @OwainEvans_UK
Next experiment: You can implant a backdoor to a Hitler persona with only harmless data. This data has 3% facts about Hitler with distinct formatting. Each fact is harmless and does not uniquely identify Hitler (e.g. likes cake and Wagner).
2
9
497
Jan Disselhoff retweeted
8 Dec 2025
ARC Prize 2025 Winners Interviews Top Score 2nd Place The ARChitects (@dvhrtm, @JDisselh, Daniel Franzen) detail their 2D-aware, masked-diffusion LLM w/recursive self-refinement perspective-based scoring - improving substantially over the team's 2024 autoregressive system.
1
4
49
6,631
Jan Disselhoff retweeted
6 Dec 2025
DPO pushed baguettotron so far into unreadable experimental land that I didn't like it however skipping straight from SFT to GRPO is producing moments that make me forget that this model is only 371M params GRPO w mostly format reward (</think>, title, length), a huge repetition penalty, and 20% aesthetic reward from the aforementioned reward model
2 Dec 2025
baguettotron poetry llm experiments complete and to come: - train baguettotron bradley-terry reward model on 10k kimi vs gemma 3n poems (failed, look at data, reward hacking formatting quirks) - sft baguettotron on 10k kimi poems and reverse-engineered SYNTH reasoning traces (worked) - train baguettotron RM on 38k preference pairs (13k poems ranked by claude agents to match personal aesthetic (good enough?: 82% accuracy on validation set) up next: - redo SFT run with 8k-9k poems instead of 10k - use 1-2k kimi poems and traces in a DPO run against broken baguettotron outputs to stabilize reasoning format and length (inspired by olmo 3) - GRPO with the preference RM and a few other verifiables (trace format, length, repetition penalty) stretch: - once formatting / reasoning is stabilized, exp with self-play reward loop - ablations (money / time permitting)
3
1
42
16,024
Ivan Sorokin and I are the official winners on the Arc Prize competition, with a significant lead over other teams. Thanks to @kaggle and @arcprize for hosting the competition. NVIDIA tech blog summarizing what we did: developer.nvidia.com/blog/nv… Our writeup: kaggle.com/competitions/arc-… Our code: github.com/1ytic/NVARC
41
56
572
85,687
Jan Disselhoff retweeted
6 Dec 2025
ARC Prize 2025 Winners Interviews Top Score 1st Place NVARC (@JFPuget, Ivan Sorokin) detail their synthetic-data-driven ensemble of an improved ARChitects-style, test-time-trained model TRM-based components that reaches ~24% on ARC-AGI-2 under Kaggle contest constraints.
3
12
92
11,307
Jan Disselhoff retweeted
5 Dec 2025
Announcing the ARC Prize 2025 Top Score & Paper Award winners The Grand Prize remains unclaimed Our analysis on AGI progress marking 2025 the year of the refinement loop
25
48
311
222,866
One ingredient of our solution is the Tiny Recursive Model of @jm_alexia . During the competition we got a score of 10% on the semi private dataset of arc agi2, and 10.41% on the public eval dataset. I further trained TRM for 10 more days using the same recipe as in our solution. There is some volatility, but the best pass@2 I got on the public eval dataset was 18.05%. This is within Kaggle arc prize compute limits.
We also appear on the ARC AGI2 leaderboard. Not best score, but clearly on the Pareto frontier with a much lower cost than best scores.
6
13
151
42,853
The things above were the things that worked, but of course we had a lot of approaches that did not pan out. Most frustrating one: Dave invested a lot of time into synthetic data generation, which was the approach that the first place NVARC team used! (More examples in blog)
1
5
205
All in all an amazing experience! Huge thanks to the organizers @arcprize and congratulations to the other winning teams and papers! Definitely check them out at arcprize.org/competitions/20…
1
4
142
(P.S. I vaguely remember some paper that merges word embeddings to reduce token counts in LLMs that I wanted to link, but can't find anymore. If anyone knows what I am talking about hmu, or share below!)
1
2
138