Filter
Exclude
Time range
-
Near
まじでCPU12600で全然問題ない 7800X3Dとか欲しかったけど正直その金でもう一枚グラボ買った方がLoRA学習に使えていいって感じ MultiGPUの項目もあったから電源デカくして2枚挿しでもいいね...
108
It looks like they don't have multigpu boxes with rtx cards. Single GPU is fine for dev, but we use multiple GPUs for sweeps
4
167
Replying to @sudoingX
32 in multigpu because I am not dumb and I am not paying the robbery prices out there now. If people had more brains than deep pockets prices would level, but they'll always find the moron that needs to prove something and that will dish out 5-8k for a gpu.
47
🚀 ds4 just got faster 2! We forked @antirez’s ds4 and Dramatically speed up multi-GPU layer-splitting inference on a single machine using a single process, with full support for consumer-grade graphics cards. The entire codebase was modified using open-source models via OpenCode throughout the process. We only participated in the testing part. Fork is open source 👉 github.com/huihui-support/ds… #ds4 #LLM #MultiGPU #Inference

🚀 ds4 just got faster! We forked @antirez’s ds4 and added automatic Unix Domain Socket (UDS) switching for same-machine multi-GPU. Tested on: **153G Q4 model 2× RTX 6000 Pro** **Inference phase improved by at least 2 tokens/s**! localhost now automatically uses UDS — completely eliminating TCP loopback overhead. Multi-machine mode remains fully unchanged. Fork is open source 👉 github.com/huihui-support/ds… #ds4 #LLM #MultiGPU #Inference
4
38
10,286
🚀 ds4 just got faster! We forked @antirez’s ds4 and added automatic Unix Domain Socket (UDS) switching for same-machine multi-GPU. Tested on: **153G Q4 model 2× RTX 6000 Pro** **Inference phase improved by at least 2 tokens/s**! localhost now automatically uses UDS — completely eliminating TCP loopback overhead. Multi-machine mode remains fully unchanged. Fork is open source 👉 github.com/huihui-support/ds… #ds4 #LLM #MultiGPU #Inference
3
3
20
14,340
Replying to @D_MFPS
Si, pero ahora empiezan a verse equipos multigpu en masa para meterse ahí con GPU gaming y hasta ahora el precio excepto por la 5090 estaba "bien"
1
2
77
**🚀 My 4× RTX 3090 Turbos NVLink Rig is a Full Local Video AI Studio in 2026** **— and it’s still one of the best consumer setups you can own.** Most people think you need 4090s or H100s for serious local video generation. I’m here to tell you: **not true.** With 96 GB total VRAM NVLink, my 4× 3090 Turbo machine runs a complete parallel creative pipeline that feels like magic. ### What I’m Running Right Soon (Simultaneously): **✅ Video Generation** - **WAN 2.2 (14B MoE)** on GPUs 0 1 → Best version for 24 GB cards. Perfect motion, prompt following, and LoRA support. → 720p–1080p clips with excellent consistency. → NVLink ComfyUI MultiGPU = ~3–4× faster than single GPU. **✅ LLM Co-Pilot** - **Qwen 3.6 (vLLM)** on GPUs 2 3 → Generates song-specific prompts, scene scripts, lyric timing, style descriptions. → Stays lightning fast and responsive even while WAN is cooking clips. **✅ Audio Post-Production** Whisper, Demucs, Bark v2, MDX23 stem separation RIFE v4.6 for buttery smooth interpolation Real-ESRGAN / SUPIR for upscaling & restoration SAM2 FaceFusion for masking, VFX, lip-sync (LTX-Video audio-native when needed) ### The Killer Music Video Workflow Raw track → Qwen writes the visual script → WAN 2.2 generates the clips → RIFE smooths → ESRGAN upscales → SAM2 lip-sync finishes it. **All happening at the same time. No model swapping. No VRAM fights. No crashes.** This rig is peak “local creator” energy. No cloud bills. No waiting queues. Full creative control. If you have 4× 3090s (or similar) and NVLink — you’re sitting on one of the best video AI machines possible without jumping into datacenter territory. Who else is still rocking 3090s in 2026 and loving it? Drop your setup below 👇 #LocalAI #VideoGeneration #ComfyUI #WAN2 #3090Gang #MusicVideoAI #NVLink
2
109
Most multiGPU rig ppl using a UPS? Raw on mains? I hope you have a UPS for your RTX6000s 😅 if you have power like we do!
10
1
29
2,873
ComfyUI-MultiGPUってカスタムノード有ったんか。今まで知らんかったわwこれで動画生成行けるな
2
155
May 14
Need to set it up once more. Got idea for nice multigpu end2end asset design workflow.
2
90
bpm140の32拍目の13.7秒を指定したら、ちゃんと動きの方向のそろったループできた。ワークフロー貼るけど、MultiGPU版(VRAM38GB使用)だし未公開ノードあるので参考まで。解像度と秒数さげたら24GBとかに収まると思うし、一部切り出せばLTX-2.3でループ生成するのに使えるよ。
1
7
706
NVIDIA RTX PRO 6000、2種の外観比較 右のWorkstation edition は600Wのパワー、極太ケーブルで、躯体もとてもセクシー 左のMaxQは300W、コンパクトなMultiGPUセットアップに最適です
12
23
378
22,500
Apr 13
Seeking the ultimate PCIe 4.0 scalability in a motherboard? Do you hope to expand into hundreds of GBs of VRAM all under the reign of a single CPU? This board may have flown under your radar. It can accommodate up to 13x GPUs at (PCIe 4.0 x8). Look no further than the ASRock Rack ROMED16QM3 — a single-socket SP3 server board supporting up to the 7773X (the crème de la crème of the EPYC Milan generation, 64 cores and 800MB of L3 cache) with 16x DDR4 slots for 1TB RAM and full IPMI remote management. Can you really call your local AI rig a “server” if it doesn’t have IPMI? Unmatched GPU expandability: → 2x onboard PCIe 4.0 x8 slots (SLOT 6/7, SLOT 6 removed as it shared lanes with M.2s) → 12x low-profile SlimSAS (SFF-8654) PCIe 4.0 x8 ports → Max 13x GPUs at x8 PCIe 4.0 (plug every SlimSAS the single slot straight into a Gen4 riser). → Or pair the 12x SlimSAS into 6x x16 links 2x x8 slots for 8x GPUs at full x16 with lanes left for RAID 0 NVMe config. → Perfect for sharding massive models or configuring tens of small, dense LLMs. NVLink bridges connect 3090 pairs for additional speedups. → 7003X series Milan EPYCs handle nonstop tool calling and query floods from agentic harnesses SlimSAS hardware you’ll need: → 12x SlimSAS SFF-8654 to SFF-8654LP (Low Profile) 8i cable - PCIe gen4 c-payne.com/products/slimsas… → 12x C-Payne SlimSAS PCIe gen4 Device Adapter x8/x16 c-payne.com/products/slimsas… This board is pure local AI gold for stacking last gen GPUs without compromise. Thank you to @alecglovertech on YouTube for making me aware of this nugget. #LocalLLaMA #Homelab #EPYC #MultiGPU #LLMInference #SelfHostedAI
5
670
Replying to @SlipperyGem
Gotta get a multigpu setup going asap!!!
1
2
178
Mar 30
Depends on the GPU. I suspect if you only have a single GPU you aren’t rocking data center stuff that can partition. But you can get a lot out of a single GPU Tbh though: the serious stuff isn’t multigpu but many-node Your best bet is probably buying some older Radeon data center cards, ~$150-200, 3d printing fan brackets on them and going from there. If you can build a homelab.
1
2
85
Mar 29
Replying to @jino_rohit
depending how you define what is or isn’t a concept in that domain, yeah you can use ALL of the vram on a card rather than only what a model needs, acting as a multigpu on one card for faster speed NCCL and NVLink arent involved though, so you wont really understand the bottlenecks with perf maxing on real multigpu yet
1
3
94
Intel Arc Pro B70: hasta 367 TOPS y configuraciones multiGPU de hasta 128 GB de VRAM pronetic.geeknetic.es/Notici…
1
18
2,076
Replying to @kermankohli
if budget allows, single RTX 6000 96GB no question. one card, no splits, no tensor parallel overhead, full model in VRAM. you can run 70B at Q8 without thinking about it. simplest possible setup. but if you're optimizing for cost: 2x 3090 with NVLink gives you 48GB unified for under $2K used. runs 80B coding models at 46 tok/s. that's what i tested Qwen3 Coder Next on. for coding specifically, Qwen 3.5 27B dense fits on a single 3090 at 35 tok/s with 262K context. you might not even need multiGPU depending on which model you pick.
3
13
1,086
5080 alone can run out of vram at siginficant context window. I've 5080 4070 and can run context over 160k-256k with quen3.5 (35B-A3B Q4 or 9B Q8). But don't use LMStudio, its bad at multiGPU. I build my own framework (including UI) around llama server.
2
1
93