1/ 🚀 This shouldn’t fit on a 3090 — but it does.
🧠 Qwen3.6-27B running a full OpenAI-style agent stack on a single RTX 3090 (24GB):
👁 vision · 🔧 tools · ⚡ streaming · 🤔 reasoning · 🧩 MTP n=3
🚫 no feature cuts
📊 50 / 66 TPS @ 218K text
📊 51 / 68 TPS @ 198K vision
Built on @Alibaba_Qwen@vllm_project 👇
github.com/noonghunna/club-3…
Using Qwopus3 Q5 and Hermes is not so good, Hermes is quickly running in loop to try to patch but with copaw, i don't have this problem, it looks much better and doesn't go in loop.
wan 2.2 is starting to feel unfair
you drop in a single reference video
swap the face with one image
and the system rebuilds the entire ad shot-for-shot
same pacing
same structure
same emotional beats
new avatar
brands are using it to clone top-performing videos across every niche without refilming anything
i put together a short breakdown showing
– how to pick the right source videos
– how to avoid uncanny outputs
– and how to generate variations that actually convert
rt comment “wan” and i’ll send it over
(follow for dm)