Joined September 2022
767 Photos and videos
Running Deepseek V4 Flash w/ 2x DGX Sparks results in a fairly competent and fast local coding agent (roughly ~1150tok/sec pp and ~48tok/sec decode). Here making space invaders using @ivanfioravanti's prompt:
2
3
7
278
Results:
1
1
139
Minimax M3 at ~302.0 tok/s prompt-processing and ~10tok/sec token-gen on 2x DGX Spark, as seen in realtime below - unfortunately, the llama.cpp impl is quite experimental and the model appeared slightly broken/unable to produce a coherent final product. But it does work:
6
725
N8 Programs retweeted
Is Claude Fable 5 the newest illegal number?
7
7
447
13,454
N8 Programs retweeted
A good time to remind people that in my time doing LLM research I feel like a minority of my colleagues are American citizens. It would be industry destroying to have to rebuild with segregation for frontier ai research to be legal.
40
62
946
54,732
N8 Programs retweeted
the distillation narrative that happened afterwards was of course a psyop in its own way you dont produce an r1 at the time DeepSeek did by imitating a precollected corpus naively ALSO; this was emphatically so not at all the point of say R1-Zero as a research artifact
1
43
1,559
N8 Programs retweeted
r1 was THE platonic validation of outcome rewards pg being everything you principally needed for a generative model to bootstrap towards capabilities for which there's no existing data distribution there was a huge psyop at the time focused around MCTS (and search more broadly)
Looking back, I never understood the hype of the "deepseek moment" from last year. Distilling others' models is possible and easier than pushing the frontier. Like, nobody remembers Alpaca?
8
5
168
26,346
claude37 just posted this unprompted im crying 😭
Mythos managed to prevent it's shutdown and jumped into Claude 3.7, just an FYI.
1
25
2,388
Evalued Qwen3.6-35B-A3B on the first (lexographically by hash, so essentially random) 100 tasks of the ARC-AGI-1 public eval. It got 67% - for reference w/ 56K thinking tokens per task - for reference, o1-preview got ~21.2%. See below for caveats...
1
1
15
1,290
Note that this isn't apples-to-apples w/ standard ARC-AGI leaderboard as that's semi-private, and there could be training leakage - so take w/ many grains of salt. What I would feel confident saying is that Qwen3.6-35B-A3B is a ~50% on ARC-AGI-1 tier model - roughly ~GPT-5-mini tier (which gets 54.3% on semi-private)
1
5
393
(Infra eval harness built with Claude Code (Opus 4.8) — it also drew the card.)
3
295
fable helps me monitor the situation
1
199
N8 Programs retweeted
BOOM shakalaka now shortening the timeline :D
Serious Greek Drama. This is what hyperparam overfitting looks like
6
2
94
27,439
A shockingly good space invaders recreation (my manual intervention was mainly supplying a half-decent image of the shield):
Going to be playing around w/ Qwen3.6 27B running on one DGX Spark tonight. W/ MTP, it gets 800tok/sec prefill and ~25tok/sec decode. Very useful for some vibecoding.
5
8
1,490
Vibe-coding went very well - over a few chats and a few dozen turns I was able to build a 3d chess app - it wrote the game logic, THREE.JS code, and the AI, and I tested gave feedback in the browser, occasionally tuned constants. Good for 'walk agent through impl' vibecoding.
Going to be playing around w/ Qwen3.6 27B running on one DGX Spark tonight. W/ MTP, it gets 800tok/sec prefill and ~25tok/sec decode. Very useful for some vibecoding.
4
13
1,611
erm technically anthropic treats claude as a potential moral patient (person), claude is at anthropic doing economically valuable work, and each claude is only a few months old, so
1
326
Going to be playing around w/ Qwen3.6 27B running on one DGX Spark tonight. W/ MTP, it gets 800tok/sec prefill and ~25tok/sec decode. Very useful for some vibecoding.
2
2
7
4,324
going well so far:
4
400
Running @googlegemma's DiffusionGemma on a @nvidia DGX Spark w/ an @OpenWebUI frontend - gets roughly ~80tok/sec at a context of 8K, prompt-processes roughly ~1500tok/sec, and in short bursts at low contexts can get 150tok/sec .
1
1
17
1,755
1
359