アキレス健🔞

アキレス健🔞

Users
Tweets

アキレス健🔞@044ia044141

Jun 13

まじでCPU12600で全然問題ない 7800X3Dとか欲しかったけど正直その金でもう一枚グラボ買った方がLoRA学習に使えていいって感じ MultiGPUの項目もあったから電源デカくして2枚挿しでもいいね...

108

Joseph Suarez 🐡

Joseph Suarez 🐡

@jsuarez

Jun 11

Replying to @HusbandOfAyn @yacineMTB

It looks like they don't have multigpu boxes with rtx cards. Single GPU is fine for dev, but we use multiple GPUs for sweeps

167

Ale

Ale @Elettra251

Jun 11

Replying to @sudoingX

32 in multigpu because I am not dumb and I am not paying the robbery prices out there now. If people had more brains than deep pockets prices would level, but they'll always find the moron that needs to prove something and that will dish out 5-8k for a gpu.

huihui.ai

huihui.ai

@support_huihui

Jun 6

🚀 ds4 just got faster 2! We forked @antirez’s ds4 and Dramatically speed up multi-GPU layer-splitting inference on a single machine using a single process, with full support for consumer-grade graphics cards. The entire codebase was modified using open-source models via OpenCode throughout the process. We only participated in the testing part. Fork is open source 👉 github.com/huihui-support/ds… #ds4 #LLM #MultiGPU #Inference

huihui.ai

@support_huihui

Jun 1

🚀 ds4 just got faster! We forked @antirez’s ds4 and added automatic Unix Domain Socket (UDS) switching for same-machine multi-GPU. Tested on: **153G Q4 model 2× RTX 6000 Pro** **Inference phase improved by at least 2 tokens/s**! localhost now automatically uses UDS — completely eliminating TCP loopback overhead. Multi-machine mode remains fully unchanged. Fork is open source 👉 github.com/huihui-support/ds… #ds4 #LLM #MultiGPU #Inference

10,286

huihui.ai

huihui.ai

@support_huihui

Jun 1

GitHub - huihui-support/ds4: DeepSeek 4 Flash local inference engine for Metal

DeepSeek 4 Flash local inference engine for Metal. Contribute to huihui-support/ds4 development by creating an account on GitHub.

github.com

14,340

GamerBits

GamerBits @gamerbitsnet

May 31

Replying to @D_MFPS

Si, pero ahora empiezan a verse equipos multigpu en masa para meterse ahí con GPU gaming y hasta ahora el precio excepto por la 5090 estaba "bien"

Paul Buckingham

Paul Buckingham

@axonis22

May 31

**🚀 My 4× RTX 3090 Turbos NVLink Rig is a Full Local Video AI Studio in 2026** **— and it’s still one of the best consumer setups you can own.** Most people think you need 4090s or H100s for serious local video generation. I’m here to tell you: **not true.** With 96 GB total VRAM NVLink, my 4× 3090 Turbo machine runs a complete parallel creative pipeline that feels like magic. ### What I’m Running Right Soon (Simultaneously): **✅ Video Generation** - **WAN 2.2 (14B MoE)** on GPUs 0 1 → Best version for 24 GB cards. Perfect motion, prompt following, and LoRA support. → 720p–1080p clips with excellent consistency. → NVLink ComfyUI MultiGPU = ~3–4× faster than single GPU. **✅ LLM Co-Pilot** - **Qwen 3.6 (vLLM)** on GPUs 2 3 → Generates song-specific prompts, scene scripts, lyric timing, style descriptions. → Stays lightning fast and responsive even while WAN is cooking clips. **✅ Audio Post-Production** Whisper, Demucs, Bark v2, MDX23 stem separation RIFE v4.6 for buttery smooth interpolation Real-ESRGAN / SUPIR for upscaling & restoration SAM2 FaceFusion for masking, VFX, lip-sync (LTX-Video audio-native when needed) ### The Killer Music Video Workflow Raw track → Qwen writes the visual script → WAN 2.2 generates the clips → RIFE smooths → ESRGAN upscales → SAM2 lip-sync finishes it. **All happening at the same time. No model swapping. No VRAM fights. No crashes.** This rig is peak “local creator” energy. No cloud bills. No waiting queues. Full creative control. If you have 4× 3090s (or similar) and NVLink — you’re sitting on one of the best video AI machines possible without jumping into datacenter territory. Who else is still rocking 3090s in 2026 and loving it? Drop your setup below 👇 #LocalAI #VideoGeneration #ComfyUI #WAN2 #3090Gang #MusicVideoAI #NVLink

109

Digital Spaceport

Digital Spaceport

@gospaceport

May 29

Most multiGPU rig ppl using a UPS? Raw on mains? I hope you have a UPS for your RTX6000s 😅 if you have power like we do!

2,873

桃ねくたー🍑ゲームやる時間が無ぇ

桃ねくたー🍑ゲームやる時間が無ぇ @Peachy_Nectar

May 15

ComfyUI-MultiGPUってカスタムノード有ったんか。今まで知らんかったわｗこれで動画生成行けるな

155

AI guy

AI guy

@AiFreak_

May 14

Replying to @SlipperyGem @kvickart

Need to set it up once more. Got idea for nice multigpu end2end asset design workflow.

⚙gear machine@AI

⚙gear machine@AI @grmchn4ai

Apr 19

bpm140の32拍目の13.7秒を指定したら、ちゃんと動きの方向のそろったループできた。ワークフロー貼るけど、MultiGPU版(VRAM38GB使用)だし未公開ノードあるので参考まで。解像度と秒数さげたら24GBとかに収まると思うし、一部切り出せばLTX-2.3でループ生成するのに使えるよ。

0:14

706

AI✖️Satoshi⏩️

AI✖️Satoshi⏩️

@AiXsatoshi

Apr 18

NVIDIA RTX PRO 6000、2種の外観比較右のWorkstation edition は600Wのパワー、極太ケーブルで、躯体もとてもセクシー左のMaxQは300W、コンパクトなMultiGPUセットアップに最適です

378

22,500

sten

sten

@btc_sten

Apr 13

Seeking the ultimate PCIe 4.0 scalability in a motherboard? Do you hope to expand into hundreds of GBs of VRAM all under the reign of a single CPU? This board may have flown under your radar. It can accommodate up to 13x GPUs at (PCIe 4.0 x8). Look no further than the ASRock Rack ROMED16QM3 — a single-socket SP3 server board supporting up to the 7773X (the crème de la crème of the EPYC Milan generation, 64 cores and 800MB of L3 cache) with 16x DDR4 slots for 1TB RAM and full IPMI remote management. Can you really call your local AI rig a “server” if it doesn’t have IPMI? Unmatched GPU expandability: → 2x onboard PCIe 4.0 x8 slots (SLOT 6/7, SLOT 6 removed as it shared lanes with M.2s) → 12x low-profile SlimSAS (SFF-8654) PCIe 4.0 x8 ports → Max 13x GPUs at x8 PCIe 4.0 (plug every SlimSAS the single slot straight into a Gen4 riser). → Or pair the 12x SlimSAS into 6x x16 links 2x x8 slots for 8x GPUs at full x16 with lanes left for RAID 0 NVMe config. → Perfect for sharding massive models or configuring tens of small, dense LLMs. NVLink bridges connect 3090 pairs for additional speedups. → 7003X series Milan EPYCs handle nonstop tool calling and query floods from agentic harnesses SlimSAS hardware you’ll need: → 12x SlimSAS SFF-8654 to SFF-8654LP (Low Profile) 8i cable - PCIe gen4 c-payne.com/products/slimsas… → 12x C-Payne SlimSAS PCIe gen4 Device Adapter x8/x16 c-payne.com/products/slimsas… This board is pure local AI gold for stacking last gen GPUs without compromise. Thank you to @alecglovertech on YouTube for making me aware of this nugget. #LocalLLaMA #Homelab #EPYC #MultiGPU #LLMInference #SelfHostedAI

ALT PCIe 6-pin 12V power connector mandatory

670

kvick

kvick

@kvickart

Apr 9

Replying to @SlipperyGem

Gotta get a multigpu setup going asap!!!

178

Joseph Suarez 🐡

Joseph Suarez 🐡

@jsuarez

Apr 7

x.com/i/article/203760455016…

788

109,441

DM

@Sudden_Sudo

Mar 30

Replying to @jino_rohit @1a1n1d1y

Depends on the GPU. I suspect if you only have a single GPU you aren’t rocking data center stuff that can partition. But you can get a lot out of a single GPU Tbh though: the serious stuff isn’t multigpu but many-node Your best bet is probably buying some older Radeon data center cards, ~$150-200, 3d printing fan brackets on them and going from there. If you can build a homelab.

andy

andy

@1a1n1d1y

Mar 29

Replying to @jino_rohit

depending how you define what is or isn’t a concept in that domain, yeah you can use ALL of the vram on a card rather than only what a model needs, acting as a multigpu on one card for faster speed NCCL and NVLink arent involved though, so you wont really understand the bottlenecks with perf maxing on real multigpu yet

elhacker.NET

elhacker.NET @elhackernet

Mar 28

Intel Arc Pro B70: hasta 367 TOPS y configuraciones multiGPU de hasta 128 GB de VRAM pronetic.geeknetic.es/Notici…

2,076

Sudo su

Sudo su

@sudoingX

Mar 10

Replying to @kermankohli

if budget allows, single RTX 6000 96GB no question. one card, no splits, no tensor parallel overhead, full model in VRAM. you can run 70B at Q8 without thinking about it. simplest possible setup. but if you're optimizing for cost: 2x 3090 with NVLink gives you 48GB unified for under $2K used. runs 80B coding models at 46 tok/s. that's what i tested Qwen3 Coder Next on. for coding specifically, Qwen 3.5 27B dense fits on a single 3090 at 35 tok/s with 262K context. you might not even need multiGPU depending on which model you pick.

1,086

Blockwork Orange

Blockwork Orange @BlocktimeOrange

Mar 10

Replying to @TusharSoni014 @Drag_AILabs @opencode @lmstudio

5080 alone can run out of vram at siginficant context window. I've 5080 4070 and can run context over 160k-256k with quen3.5 (35B-A3B Q4 or 9B Q8). But don't use LMStudio, its bad at multiGPU. I build my own framework (including UI) around llama server.