Filter
Exclude
Time range
-
Near
Replying to @AlicanKiraz0
Is this with MTP?
7
Vabim na ogled danes ob 18.45 na TV3:) ples ob drogu in izvrsten trio MTP;) paradaplesa.si/tristosestdes…
1
18
This is why im putting all my flecktarn in a bag for disposal. DPM and MTP are cooler anyway
Local weirdo gets her jacket
6
Rum Boi retweeted
A massive industrial project is taking shape in Gopalpur, Odisha. > Investment - Rs 2,675 crore > Capacity - 376,000 MTPA > Project status - Final stages of completion and commissioning > Product - Technical Ammonium Nitrate (TAN) > Investor - Smartchem Technologies Limited (STL), a 100% subsidiary of Deepak Fertilizers and Petrochemicals Corporation Limited (DFPCL)
5
64
452
10,069
阪神 11R 宝塚記念 私の夢はmtp決着 ◎ダノンデサイル ◯レガレイラ 3連単2頭軸マルチ 相手もクロワとタバルのみ JCから1着固定のダノンデサイル本命 荒れなくていい
596
Val Keynes retweeted
#albertjamesmoriarty #moriartythepatriot #mtp #yuumori i want more info on his mental state🙏
2
12
47
288
I’m running Qwen3.6-27b-MTP-UD-Q4-K_M with 96k context on my 3090 without KV cache quantisation. Non-MTP gets 128k context. KV at 8 bit gets more…
1
5
𝄞 retweeted
260613 📌CASIO ⌚|MTP-M305D-1AV ©Chapters_Hao725 #ZHANGHAO #장하오 #章昊 #ジャンハオ #AND2BLE #앤더블
17
71
1,712
Unsloth just showed Gemma 4 hitting 162t/s on 12B with MTP on 6gb RAM That number is absurd to see, but the real shift that it is showing is what happens when inference outpaces human reading speed. And we’re about already there for single consumer GPU.
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️ MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss. Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s. GGUFs Guide: unsloth.ai/docs/models/mtp
1
4
Replying to @ApplyWiseAi
for mac/5090, 100 tps is already a very very optimistic for a 27b dense/35b-a3b with Q3/MTP/some kv-cache tricks and would only keep this high with <16K context. parallel execution or multi-slot is much much slower.
16
neyonta retweeted
#ReadAWrite #วิลเชอร์ #ไมอัล#willsher#myal#mtp ขอฝากฟิคเรื่องนี้ไว้ในใจทุกคนด้วยนะคะ👉🏻👈🏻🩷🩷 (อัปทุกวันเสาร์​ค่ะ)​ #วิชาที่ชอบเธอที่ใช่​ | วิลเชอร์ readawrite.com/a/d3555050dbd…
2
20
104
2,005
Replying to @henghaer123
做infra吧,国内厂商都得买华为卡,华为卡下限低但是极致优化以后或许性能还算OK,但是太难优化了所以对手都不优化,自己针对自己的旗舰模型优化好了成本就能比对手低的多,那自己实质上就成了唯一一个能卖高性价比token的厂商,然后minimax的思路是把关键的MTP什么的全部藏起来
1
31
Replying to @accosan_
インテルMTPまで行かんからななぁ… 最大でも80W前後だから_(:3 」∠)_
1
1
14
priya joseph retweeted
Jun 12
You don't even need a laptop to learn local llms! I just ran Unsloth Gemma 4 E2B QAT Multi Token Prediction (MTP) - 12 tokens/sec on a 6 years old phone's cpu with llama.cpp and termux! Unsloth just dropped MTP draft assistant GGUFs for every Gemma 4 model. naturally I yolo'd it straight onto Android to see what happens. not the 2 bit quant. UD-Q4_K_XL. works on any phone with ≥8 GB RAM. # Device: Note 20 Ultra (6 years old) -without MTP -> 7-9 tok/s -with MTP -> 9-12 tok/s ~20-30% faster on a phone. free speedup. I'll take it. # copy the command: LD_LIBRARY_PATH=. ./llama-server \ -m ~/storage/shared/llm/gemma-4-E2B-it-qat-UD-Q4_K_XL.gguf \ --spec-type draft-mtp \ --spec-draft-model ~/storage/shared/llm/mtp-gemma-4-E2B-it.gguf \ --spec-draft-n-max 4 \ --spec-draft-p-min 0.6 \ -c 4096 -t 4 --port 8080 --no-mmap -v beginner friendly Termux guide to run ggufs with llama.cpp on android HuggingFace model link in the comments. no excuses.
Gemma 4 now runs 2x faster with MTP GGUFs! Run locally on just 6GB RAM. ⚡️ MTP enables Google Gemma 4 run ~1.4–2.2× faster with no accuracy loss. Gemma 4 12B MTP can run at 162 t/s vs. 52 t/s without MTP. 31B reaches 101 t/s. GGUFs Guide: unsloth.ai/docs/models/mtp
10
18
147
23,582
rumi🎀⁷ needs help🙏🏽 retweeted
i am studying the Medical Termination of Pregnancy (MTP) Act, 1971 right now and i hope NO ONE TOUCHES OR AMENDS THIS LAW EVER
33
27
915
26,032
The educate class are responsible who drove high end cars but don't have common sense and the MTP, BEST, BMC who doesn't act and stop this problem
1
20
3/3 VRAM & Verdict In tight VRAM (e.g. 16GB) or long context, EAGLE-3's speculator cost can choke performance. If acceptance rate drops, it's slower than raw inference. MTP: Reliable daily driver. EAGLE-3: Specialized nitro-boost for structured outputs.
2