Sudo su

Sudo su

112 Photos and videos

Tweets

BrainOS retweeted

Sudo su

@sudoingX

the one box i was missing just landed anon. this is the @FrameworkPuter desktop with amd's strix halo, ryzen ai max 395, 128gb of unified memory, up to 96 of it addressable as vram. amd and framework sent it over for honest testing, no strings attached, and i've been waiting on this one specifically. here's why it matters. i've run local ai on basically everything, a 150 dollar drawer card, a 3090, a 5090, the dgx spark, datacenter h200s. the one gap was always the accessible big memory tier on the amd side, and this fills it. 128gb unified at roughly half the price of the nvidia equivalent, the sovereignty box for people who want to run real models without a datacenter budget. booting it today. and the question i actually want answered is the one nobody answers straight: what does this thing really run? same bar i hold every other card to. amd, nvidia, apple, measured, never vibes. let's find out what it's got.

Sudo su

@sudoingX

Jun 11

listen up ROCm and Vulkan builders. @FrameworkPuter just shipped me strix halo desktop, 128GB unified, landing on my desk tuesday. everyone keeps asking what actually runs on this thing beyond vendor charts and forum guesses. so i'm going to answer it properly. starting with big MoE models since massive total params on light active is the whole point of 128GB unified. if there's a specific model or quant you want tested on strix halo, reply and it goes in the queue.

11,051

BrainOS

BrainOS

@BadBrainCode

x.com/i/article/202137784112…

BrainOS

BrainOS

@BadBrainCode

Jun 12

Just found out Gemma4 26B has some amazing video capabilities. Can you wait to try out manim. 1. Video Analysis (Visual & Audio) * Vision Analysis: If you provide a link to a video or a screenshot from one, I can use my vision capabilities to describe what is happening, identify objects, read text on screen, or analyze the composition. * YouTube Summarization: I can access YouTube transcripts to summarize videos, extract key points, or answer specific questions about the spoken content. 2. Video Generation & Animation * Manim Animations: I can write Python code using the Manim engine to create high-quality, mathematical, or algorithmic animations (similar to the style of 3Blue1Brown). * ComfyUI: I can interface with ComfyUI workflows to generate or manipulate video content if you have a generative setup configured. * ASCII Video: I can convert existing video or audio files into colored ASCII art formats (MP4/GIF). 3. Video Editing & Processing * Technical Processing: Using the terminal and Python, I can perform programmatic edits like trimming, resizing, converting formats, or extracting frames/audio using tools like ffmpeg.

BrainOS

BrainOS

@BadBrainCode

Jun 12

x.com/i/article/206526034751…

531

Alex Cheema

BrainOS retweeted

Alex Cheema

@alexocheema

Jun 8

.@nvidia gave us all the hardware we need to make local AI awesome. 16 x DGX Spark 3 x RTX Spark 1 x DGX Station 24 x ConnectX-7 Cables 2 x High Speed Switches local dot AI

Nader Khalil🍊

@NaderLikeLadder

Jun 8

How it started // how it's going @exolabs @NVIDIAAI

143

116

1,662

265,836

End Wokeness

BrainOS retweeted

End Wokeness

@EndWokeness

Jun 8

NOVEMBER 30, 2010: Four weeks after losing the election as AG of CA, Kamala Harris wins due to mail-in surge from LA Harris won by less than 1%

0:19

850

13,720

57,024

1,926,254

On yedinci Mesele

BrainOS retweeted

On yedinci Mesele @sekizaltmis_

Jun 2

Replying to @witcheer

150-200 tok/s @SpaceTimeViking huggingface.co/AEON-7/Qwen3.…

AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-Multimodal-NVFP4-MTP-XS · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

555

Teknium 🪽

BrainOS retweeted

Teknium 🪽

@Teknium

Jun 2

Replying to @uzairansar

PSA: This is not an official app by us - it would be nice uzi if we made this more clear in the name or something

721

16,113

BrainOS

BrainOS

@BadBrainCode

May 30

This is a much watch for anyone considering local inference. Excellent presentation. Great job.

0xSero

@0xSero

May 29

I yapped about LLM Compression for 40 minutes, how much misinformation did i spread this time (,:

39:49

0xSero

BrainOS retweeted

0xSero

@0xSero

May 27

Deepseek-v4-flash REAP is done There's an 80 and 96gb version (weights) I am working through the pain of getting the model to run on sm121 (DGX Spark) If anyone of you has a working DS4 vllm/sglang config/env/docker for Sparks please share Once it works I'll make the HF public

287

12,420

BrainOS

BrainOS

@BadBrainCode

May 26

Running NVIDIA Nemotron-3-Super-120B-A12B (NVFP4 mixed) on DGX Spark / GB10 (Grace-Blackwell) via vLLM 0.19.2rc1. Architecture resolves as NemotronHMTPModel — hybrid Mamba Transformer with MTP draft head. PROBLEM: prefix caching is non-functional on this model. With --enable-prefix-caching explicit, vLLM emits this warning on boot: "Prefix caching in Mamba cache 'all' mode is currently enabled. Its support for Mamba layers is experimental." Measured behavior over a real production run: queries: 6,211 hits: 0 hit rate: 0.00% Confirmed across thousands of requests with shared system prompts that should be deduplicating cleanly. They aren't. This isn't a misconfiguration — NVIDIA's own Nemotron-3-Super deployment guide (<github.com/NVIDIA-NeMo/Nemot…> SparkDeploymentGuide) deliberately OMITS --enable-prefix-caching. So upstream knows it doesn't work on NemotronH. WHY IT MATTERS: We run a 4-way parallel fan-out pattern — brain decomposes a brief, dispatches 4 concurrent section-writes against vLLM with an identical 163-token shared system prompt, stitches results. Measured throughput: c=1: 15.0 tok/s c=2: 25.4 tok/s c=3: 30.1 tok/s c=4: 41.4 tok/s (← saturation; 4th slot effectively free) That 2.76× user-facing speedup is real. But every one of those 4 calls re-prefills the same 163-token system message independently. With a working prefix cache, that prefill cost drops ~4×. For larger shared prompts (RAG context, long instruction blocks, few-shot exemplars), the win compounds enormously. THE STRUCTURAL QUESTION: Is this a fundamental limit? Mamba's selective- scan state can't be paged like transformer KV — it's a fixed-size recurrent state, not a sequence of key/value vectors. So for the Mamba LAYERS of a hybrid model, prefix cache semantics are genuinely unclear. But for the TRANSFORMER LAYERS interleaved with them, the KV reuse should work fine. Has anyone at @NVIDIAAIDev or @NVIDIAAI considered a "hybrid prefix cache" mode that caches the transformer-layer KV pages for a matched prefix while re-running the Mamba state forward? Even a partial fix would eliminate most of the prefill cost on this architecture. Or — is there a known issue / planned vLLM PR I should be watching? Happy to share the bench harness if it's useful for repro. cc @vllm_project — same question your side. Hardware: DGX Spark, GB10, 121 GiB unified memory, --max-num-seqs 4, --quantization fp4, MARLIN MoE backend, async scheduling, no MTP (breaks structured tool-call emission — separate issue).

137

BrainOS

BrainOS

@BadBrainCode

May 26

#007FirstLightRTX

BrainOS

BrainOS

@BadBrainCode

May 26

I’d like to get some feedback on running multiple models on a DGX. Anyone else run into this?

BrainOS

@BadBrainCode

May 25

x.com/i/article/205896207676…

BrainOS

BrainOS

@BadBrainCode

May 25

x.com/i/article/205896207676…

BrainOS

BrainOS

@BadBrainCode

May 24

x.com/i/article/205855284848…

2,799

BrainOS

BrainOS

@BadBrainCode

May 23

I have become heavily addicted to local AI. It’s consuming me. Thankfully I have broken several other addictions. So I know I’ll eventually break this one. I’ve learned it’s the attachment to the feeling that the addiction gives you. Local AI excites me and I think it’s a miracle. Using it and thinking about the future makes me feel alive like never before. For those who have never promoted a jailbroken 35b local AI model, you wouldn’t understand.

X Freeze

BrainOS retweeted

X Freeze

@XFreeze

May 20

With Grok, you can literally plug xAI's entire stack directly into Hermes Agent and OpenClaw and actually run it powerfully It comes with a full, powerful suite that makes every other AI plan look obsolete Grok-powered setup gives you: • xAI's most powerful models • Speech-to-text (STT) • Text-to-speech (TTS) • Grok Imagine for images • Grok Imagine for videos • Native 𝕏 Search built-in • Native web search • Massive context window The best part is that you can literally use these capabilities directly with your existing 𝕏 Premium and SuperGrok subscriptions....not with expensive API bills There is literally no better deal in AI right now

112

777

25,787

Joey

BrainOS retweeted

Joey

@aijoey

May 20

got NVIDIA’s new Nemotron-Labs-Diffusion 8B running locally on my DGX Spark. Jetson(hermes) made me the tri mode runner the cool part: it’s one model that can answer in different “gears.” same prompt, same checkpoint: - normal mode: 10.98 tok/s - diffusion mode: 20.58 tok/s - self-spec mode: 18.01 tok/s - self-spec lora: 18.07 tok/s plain english: diffusion mode was almost 2x faster than normal mode in this tiny first test. but faster wasn’t automatically better. the fastest mode also started repeating itself, so now the real test is running a bigger prompt suite and checking both: - how fast it answers - whether the answer is actually good early result: the tri-mode idea works locally. next step is figuring out which mode is best for which kind of prompt.

5,555

mr-r0b0t

BrainOS retweeted

mr-r0b0t

@mr_r0b0t

May 19

Big news for DGX Spark users! NVFP4 optimized for your GB10 is fast. Until now, many of us have been using a working but suboptimal Marlin backend instead of CUTLASS. We benchmarked five models, all using CUTLASS backends. All ran stably, all were faster 😁

113

20,556

BrainOS

BrainOS

@BadBrainCode

May 20

x.com/i/article/205692583680…