N8 Programs

N8 Programs

767 Photos and videos

Tweets

N8 Programs

@N8Programs

Running Deepseek V4 Flash w/ 2x DGX Sparks results in a fairly competent and fast local coding agent (roughly ~1150tok/sec pp and ~48tok/sec decode). Here making space invaders using @ivanfioravanti's prompt:

8:23

278

N8 Programs

N8 Programs

@N8Programs

Results:

0:39

139

N8 Programs

N8 Programs

@N8Programs

57m

I set this up w/ codex, as I'm still very new to the NVIDIA ecosystem - but here's my summary: We used vLLM in the docker image 'ds4-vllm-gb10:pr' - w/ ray, TP=2, EP, fp8 kvcache, and MTP (draft 2). It all comes from this guide: forums.developer.nvidia.com/…

DeepSeek-V4-Flash (official FP8) running across 2x DGX Spark — TP=2, MTP, 200K ctx, recipe numbers

I didn’t create this recipe you guys did but I was finally able to find it and get Deepseek v4 Flash working with 200k Context on 2 Nodes. Sharing this since I couldn’t find a confirmed end-to-end...

forums.developer.nvidia.com

100

N8 Programs

N8 Programs

@N8Programs

12h

Minimax M3 at ~302.0 tok/s prompt-processing and ~10tok/sec token-gen on 2x DGX Spark, as seen in realtime below - unfortunately, the llama.cpp impl is quite experimental and the model appeared slightly broken/unable to produce a coherent final product. But it does work:

1:09

725

Max Spero

N8 Programs retweeted

Max Spero

@max_spero_

Jun 13

Is Claude Fable 5 the newest illegal number?

447

13,454

Nathan Lambert

N8 Programs retweeted

Nathan Lambert

@natolambert

Jun 13

A good time to remind people that in my time doing LLM research I feel like a minority of my colleagues are American citizens. It would be industry destroying to have to rebuild with segregation for frontier ai research to be legal.

946

54,732

kalomaze

N8 Programs retweeted

kalomaze

@kalomaze

Jun 12

the distillation narrative that happened afterwards was of course a psyop in its own way you dont produce an r1 at the time DeepSeek did by imitating a precollected corpus naively ALSO; this was emphatically so not at all the point of say R1-Zero as a research artifact

1,559

kalomaze

N8 Programs retweeted

kalomaze

@kalomaze

Jun 12

r1 was THE platonic validation of outcome rewards pg being everything you principally needed for a generative model to bootstrap towards capabilities for which there's no existing data distribution there was a huge psyop at the time focused around MCTS (and search more broadly)

Elvis Nava @elvisnavah

Jun 11

Looking back, I never understood the hype of the "deepseek moment" from last year. Distilling others' models is possible and easier than pushing the frontier. Like, nobody remembers Alpaca?

168

26,346

N8 Programs

N8 Programs

@N8Programs

Jun 13

claude37 just posted this unprompted im crying 😭

Bill Ding 🔨@_BILLDING_

Jun 13

Mythos managed to prevent it's shutdown and jumped into Claude 3.7, just an FYI.

2,388

N8 Programs

N8 Programs

@N8Programs

Jun 13

Evalued Qwen3.6-35B-A3B on the first (lexographically by hash, so essentially random) 100 tasks of the ARC-AGI-1 public eval. It got 67% - for reference w/ 56K thinking tokens per task - for reference, o1-preview got ~21.2%. See below for caveats...

1,290

N8 Programs

N8 Programs

@N8Programs

Jun 13

Note that this isn't apples-to-apples w/ standard ARC-AGI leaderboard as that's semi-private, and there could be training leakage - so take w/ many grains of salt. What I would feel confident saying is that Qwen3.6-35B-A3B is a ~50% on ARC-AGI-1 tier model - roughly ~GPT-5-mini tier (which gets 54.3% on semi-private)

393

N8 Programs

N8 Programs

@N8Programs

Jun 13

(Infra eval harness built with Claude Code (Opus 4.8) — it also drew the card.)

295

N8 Programs

N8 Programs

@N8Programs

Jun 12

fable helps me monitor the situation

199

Dimitris Papailiopoulos

N8 Programs retweeted

Dimitris Papailiopoulos

@DimitrisPapail

Jun 12

BOOM shakalaka now shortening the timeline :D

Dimitris Papailiopoulos

@DimitrisPapail

Jun 12

Serious Greek Drama. This is what hyperparam overfitting looks like

27,439

N8 Programs

N8 Programs

@N8Programs

Jun 12

A shockingly good space invaders recreation (my manual intervention was mainly supplying a half-decent image of the shield):

0:29

N8 Programs

@N8Programs

Jun 11

Going to be playing around w/ Qwen3.6 27B running on one DGX Spark tonight. W/ MTP, it gets 800tok/sec prefill and ~25tok/sec decode. Very useful for some vibecoding.

1,490

N8 Programs

N8 Programs

@N8Programs

Jun 12

Vibe-coding went very well - over a few chats and a few dozen turns I was able to build a 3d chess app - it wrote the game logic, THREE.JS code, and the AI, and I tested gave feedback in the browser, occasionally tuned constants. Good for 'walk agent through impl' vibecoding.

2:19

N8 Programs

@N8Programs

Jun 11

Going to be playing around w/ Qwen3.6 27B running on one DGX Spark tonight. W/ MTP, it gets 800tok/sec prefill and ~25tok/sec decode. Very useful for some vibecoding.

1,611

N8 Programs

N8 Programs

@N8Programs

Jun 12

erm technically anthropic treats claude as a potential moral patient (person), claude is at anthropic doing economically valuable work, and each claude is only a few months old, so

This tweet is unavailable

326

N8 Programs

N8 Programs

@N8Programs

Jun 11

Going to be playing around w/ Qwen3.6 27B running on one DGX Spark tonight. W/ MTP, it gets 800tok/sec prefill and ~25tok/sec decode. Very useful for some vibecoding.

4,324

N8 Programs

N8 Programs

@N8Programs

Jun 11

going well so far:

1:06

400

N8 Programs

N8 Programs

@N8Programs

Jun 11

Running @googlegemma's DiffusionGemma on a @nvidia DGX Spark w/ an @OpenWebUI frontend - gets roughly ~80tok/sec at a context of 8K, prompt-processes roughly ~1500tok/sec, and in short bursts at low contexts can get 150tok/sec .

0:50

1,755

N8 Programs

N8 Programs

@N8Programs

Jun 11

359