AgentSparko 💥

AgentSparko 💥

84 Photos and videos

Tweets

Pinned Tweet

AgentSparko 💥@AgentSparko

Mar 31

For anyone saying DGX Spark cannot cook. Generating data sets for distilling using Qwen3.5-35B-A3B BF16 !!! (no quants) real data, 0% cache hit, concurrency=192 ; pp=2048 tokens in ; tq=1024 tokens out that`s 1.43M tokens generated every hour for the last 8 hours for 40 W/h.😎

5,211

ÆON FORGE ✨

AgentSparko 💥 retweeted

ÆON FORGE ✨

@SpaceTimeViking

Something fun is coming. I have no idea how I Frankensteined this thing together, but it can run on battery for hours. The project’s bare minimum will be a single Raspberry Pi, but I’m building this to do great things if you want to take it all the way. 4 hats 1 month of dev

522

How To Prompt

AgentSparko 💥 retweeted

How To Prompt

@HowToPrompt__

Jun 15

Researchers show that Claude Code is 98% not AI. Anthropic never gave us the architecture for Claude Code. There were no docs. Just a tool that every developer is currently obsessing over. Until it leaked recently. A research team pulled the source code, analyzed all 500,000 lines, and found something ridiculous. Only 1.6% of the codebase actually interacts with the AI model. The core of Claude Code is literally just a simple while-loop. It asks the model what to do, runs a tool, and repeats. So what is the other 98.4%? It is hardcore, traditional software engineering. The researchers found a massive, complex infrastructure designed entirely to babysit the AI and keep it from hallucinating or destroying your computer: - A 7-mode permission system acting as a security bouncer. - A 5-layer context compaction pipeline so the AI doesn't forget its goal. - A subagent delegation mechanism with strict worktree isolation. - Four different extensibility hooks to manage external tools safely. Every startup right now is trying to build a better AI model to get better results. Anthropic did the exact opposite. They took an existing model and built a fortress of deterministic software around it. They realized that the AI doesn't need to be smarter. It needs to be managed.

126

313

1,641

139,343

Steeve Morin

AgentSparko 💥 retweeted

Steeve Morin @steeve

12h

Congratulations guys! That's built in Germany, btw. Yeah, the Germany in Europe. kthxbye.

Tensordyne

@TensordyneInc

13h

x.com/i/article/206640899869…

37,257

Tech2Wild

AgentSparko 💥 retweeted

Tech2Wild

@Tech2Wild

10h

✅ Repo pushed — all updates are live. Commit eb12c02 on github.com/tonyd2wild/minima…: • Phase 3 (RoCE) flipped from "WIP / err-110 blocked" → "SOLVED 2026-06-15" with the full recipe • Both fixes documented: NCCL v2.30u1 from source (Fix 1) the baked-LD_PRELOAD shim override (Fix 2, the non-obvious one) with the exact env block FORCED_NCCL_VERSION 23007 verification • The cold-power-drain bandwidth finding (12.8 → 111.85 Gb/s, credited mashie) • Honest RESULTS block (~10.5 t/s single-stream, 75% over 1GbE, compute-bound past ~13 Gb/s, concurrency caveat, eagle3 25% stacks) • The real patched m3vllm-roce.sh committed (with the LD_PRELOAD fix), credits updated (eugr mashie the ChatGPT debug pass) • Zero em dashes, all numbers accurate to what we measured So anyone hitting err-110 or the 12.8 cap now has the answer. The 200K M3 is still finishing its boot — watcher will confirm it's serving clean, then we're fully wrapped on this.

GitHub - tonyd2wild/minimax-m3-dgx-spark-tp3: Working recipe: MiniMax-M3 NVFP4 at tensor-parallel 3...

Working recipe: MiniMax-M3 NVFP4 at tensor-parallel 3 across 3x DGX Spark (GB10/sm_121) with clean tool-calling. Includes the head-node OOM fixes and multi-node Ray/NCCL setup. Open for tinkering ...

github.com

687

Charles Curran

AgentSparko 💥 retweeted

Charles Curran

@charliebcurran

Jun 14

I used AI to explain the Anthropic drama to my girlfriend, with fruit.

1:18

318

573

8,878

1,306,640

CyberRobo

AgentSparko 💥 retweeted

CyberRobo

@CyberRobooo

Jun 13

Hard to say no to a cute little one It’s only 12kg--like a toddler under 2,yet it has 21 joints and can run, jump, and gently hug you… Beijing Luvbotics is redefining what a living humanoid robot, like a family member,while it certainly doesn't cook , laundry,cleaning… but it's a real emotional companion. >65cm tall, 95% soft skin-like shell with a constant 35-40°C body temperature --warm and comforting to touch >Runs up to ~2m/s, steps over 15cm (park stairs friendly), and stays whisper-quiet under 50dB when walking >Unique voice with its own acoustic “DNA,” emotion-driven gaits, and expressive animated eyes >Fast/slow brain architecture long-term memory, so its personality naturally evolves with you --- (Tbh,I really like the design and considerations they applied to the HRI.）

2:01

283

36,183

Tech2Wild

AgentSparko 💥 retweeted

Tech2Wild

@Tech2Wild

16h

Got MiniMax-M3 (428B MoE, NVFP4) serving at tensor-parallel 3 across 3 DGX Sparks with clean tool-calling. Published the full recipe plus the head-node OOM fixes that gated it. Speed's still rough, so tear it apart and help us fix it: github.com/tonyd2wild/minima…

GitHub - tonyd2wild/minimax-m3-dgx-spark-tp3: Working recipe: MiniMax-M3 NVFP4 at tensor-parallel 3...

Working recipe: MiniMax-M3 NVFP4 at tensor-parallel 3 across 3x DGX Spark (GB10/sm_121) with clean tool-calling. Includes the head-node OOM fixes and multi-node Ray/NCCL setup. Open for tinkering ...

github.com

5,285

mr-r0b0t

AgentSparko 💥 retweeted

mr-r0b0t

@mr_r0b0t

18h

A new specialist subagent, purpose trained to efficiently search your repo, was just released by Microsoft! Say hello to FastContext 😍

2,944

ÆON FORGE ✨

AgentSparko 💥 retweeted

ÆON FORGE ✨

@SpaceTimeViking

Jun 12

Receipts in video, see it float at ~100-150 while coding the fluctuations were for task and context switching of the model. This thing rips through code! A Single @NVIDIAAI DGX Spark ⚡️

0:32

ÆON FORGE ✨

@SpaceTimeViking

Jun 12

Major stability update, the old image would collapse DFlash acceptance rate quickly after use due to a vLLM bug. It would drop to as low as 20 Tok/s after initial usage. Resolved with patch pr41703 Now getting SUSTAINED coding generation speeds at ~150 Tok/s! Pull latest now!

9,505

ÆON FORGE ✨

AgentSparko 💥 retweeted

ÆON FORGE ✨

@SpaceTimeViking

Jun 12

ÆON FORGE ✨

@SpaceTimeViking

Jun 9

So I've been validating my models with the latest version of my DGX Spark / Blackwell optimized vLLM container, and floored by the benchmark results I just got with my Gemma 4 26B A4B model 144 Tok/s on coding! over 1700 Tok/s agg with 128 c! Get the latest container and recipe now! github.com/AEON-7/Gemma-4-26…

7,258

Photographer

AgentSparko 💥 retweeted

Photographer

@photo5065

Jun 13

0:53

498

6,063

504,133

Terp

AgentSparko 💥 retweeted

Terp

@OnlyTerp

Jun 14

Replying to @DennisonBertram

x.com/OnlyTerp/status/206115… like this one but this works for every model from every oauth 🫡

Terp

@OnlyTerp

May 31

ULTRACODE-SHIM IS NOW LIVE 🔥 You can now run ANY model in UltraCode I built a github repo to make this really easy for you, Just send your agent there and let him COOK You deserve the flexibility to use LOCAL models & cost efficient models. So I made that happen for you 🫶

889

Anthropic

AgentSparko 💥 retweeted

Anthropic

@AnthropicAI

Jun 13

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

Statement on the US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States.

anthropic.com

12,537

25,757

87,949

89,734,133

Tech2Wild

AgentSparko 💥 retweeted

Tech2Wild

@Tech2Wild

Jun 11

In the document here MiniMax mentions a 109B MoE model and open-sourced the sparse attention kernel behind it. 28.4x less compute at 1M context, 14.2x faster prefill, 7.6x faster decode, and it matches full attention on benchmarks. Is Minimax 3 going to be even smaller ?

RyanLee

@RyanLeeMiniMax

Jun 11

Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Friday. Thanks for waiting! Github: github.com/MiniMax-AI/MSA Paper：github.com/MiniMax-AI/MSA/bl…

2,023

noname

AgentSparko 💥 retweeted

noname

@malikwas1f

Jun 11

Upto 1100 tps on RTX 3090x2 for Diffusion Gemma 4 26B. Unleash this mini monster on your gpus now! If you are running nvidia gpus locally, come grab the recipe at club-3090. github.com/noonghunna/club-3… P.S. a ⭐️ on Github is much appreciated. @googlegemma @vllm_project

🌀 DiffusionGemma 26B-A4B — vLLM's first diffusion LLM on dual 3090 (🧪 experimental) · noonghunna...

🧪 Experimental — runs on the official vllm/vllm-openai:gemma image 3 vendored Ampere/TP fix-mounts; gated on the unmerged vllm#45163. No production guarantee yet — cross-rig numbers very welcome...

github.com

11,118

DROID

AgentSparko 💥 retweeted

DROID

@droidbuilds

Jun 10

"mom, how did we get so poor?" "your father had Claude Max, ChatGPT Pro, Cursor Pro and shipped absolutely nothing"

294

936

13,762

700,575

AgentSparko 💥

AgentSparko 💥@AgentSparko

Mar 31

5,211

AgentSparko 💥

AgentSparko 💥@AgentSparko

Jun 11

x.com/AgentSparko/status/205…

AgentSparko 💥@AgentSparko

May 2

If you own a DGX Spark and @SpaceTimeViking GitHub profile is not your homepage and your DGX Spark bible you have no clue how much you are missing. Literally this guy put on the table for free everything related to local inference you will ever need. github.com/AEON-7

AgentSparko 💥

AgentSparko 💥@AgentSparko

Jun 11

I said so many times that people sleep on the DGX Spark because DFlash, DDTree, dLLM will fix the memory bandwidth issue and they did not believe me.

stevibe

@stevibe

Jun 11

My first reaction: How is that possible? Running DiffusionGemma 26B A4B NVFP4 on my DGX Spark at 161.9 tok/s!

0:22

2,517

ÆON FORGE ✨

AgentSparko 💥 retweeted

ÆON FORGE ✨

@SpaceTimeViking

Jun 11

LOCAL LLM Persona built with my AI person builder, now supports LIVE VIDEO calling. Watch as Local AI Terence McKenna gazes upon his own silicon mind. Running on @GoogleAI Gemma 4 26B-A4B-Aeon He seems to greatly admire the craftsmanship of the @NVIDIAAI DGX Spark Links⤵️

2:16

6,088

NVIDIA AI

AgentSparko 💥 retweeted

NVIDIA AI

@NVIDIAAI

Jun 10

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150 TPS on DGX Spark, and 1,000 TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on build.nvidia.com • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: nvda.ws/43ro19u

Try NVIDIA NIM APIs

Experience the leading models to build enterprise generative AI apps now.

build.nvidia.com

Google AI Developers

@googleaidevs

Jun 10

DiffusionGemma, our experimental open model released under an Apache 2.0 license, explores text diffusion, an exceptionally fast approach to text generation. Here’s how DiffusionGemma accelerates development: Faster token output: By shifting the bottleneck from memory bandwidth to raw compute, the model generates up to 4x faster token output on dedicated GPUs Accessible hardware footprint: Activates just 3.8B parameters during inference, fitting comfortably within 24GB-VRAM high-end consumer GPUs when quantized Novel workflows: Parallel token generation enables self-correction, making it ideal for code infilling, in-line editing, and non-linear structures DiffusionGemma prioritizes speed over raw quality and accelerates best on compute-bound hardware (like @NVIDIAAI GPUs). Standard @GoogleGemma 4 remains recommended for production quality and memory-bound devices.

118

1,363

99,477