Hopeful 4 all us running TAKE WAstate GREAT AGAIN!
#MWGA
😱 but we're not supposed 2 talk about Trump or anything uncomfortable!
TRUMP, TRUMP, TRUMP!!!
🇺🇸💪💪💪
End #SanturaryStateCities & remove ALL #ForeignInvaders#ILLEGALS who r sucking WAstate int2 decline!
There I said it!
@davidwalliams WE HAD ALREADY BELIEVED IN DESTINY--ALSO IN OUR❤️THAT FITS SO RIGHT--1🌅 WE WALKED INT2 OUR🌏,🧬-IT CHANGED OUR EVERY🌃-4 EVERYTHING WE'VE BEEN THROUGH--WE SAW IN THE🪞A❤️THAT MATCHED OURS IN EVERYTHING WE DO--ITS OURS ME&U--EVERY PIECE OF US 2--WE FOUND IS IN ME&U
We have ONE AUSTRALIAN FLAG
not this bs made up flag (just like) the WtC bs)
& now made up Torres Straight Island flag)
When One Nation
Comes int2 power, please make it ONE FLAG for ONE NATION
@OneNationAus
Not too late!
But we'll done
Sarah!!
SWAN anthr misogynistic self-serving LAZY FATCAT
Now, STOP Carl sucking up to Alboslush & the Labor party machine,these mugs r spending Australians $ on a good time 4them & dragging Australians int2 trillions of death debt
@TheTodayShow
OSCAR INT2 KV cache by @ZhongzhuZhou makes ultra-long context more practical on local devices. 🔥🚀
🔗modelscope.ai/models/togethe…
💾 KV memory: reduces Gemma 4 12B-it’s KV cache at 256K context from ~24 GiB to ~3 GiB, saving ~21 GiB
📉 Compression: ~8× smaller KV footprint with q2_0 INT2 KV cache
🧠 Method: calibrated rotations use query covariance for Keys and score-weighted value covariance for Values
🎯 Quality: pushes quantization noise into attention-insensitive directions to preserve near-f16 KV behavior
🖥️ Deployment: Gemma 4 12B now supports both ready-to-run INT2 KV GGUF models through the llama.cpp fork and INT2 KV cache through SGLang. The local path also includes Apple Metal support and fused mixed-precision flash attention.
🧩 Model support: Qwen3 with head dim 128 and Gemma 4 with head dim 512, including Gemma 4 12B-it with sliding-window layers
Can LLMs run on ultra-low-bit memory without tanking accuracy?
Researchers from Together AI, University of Sydney, and UIUC present OSCAR — a method that uses offline, attention-aware covariance analysis to design fixed rotations and clipping thresholds for 2-bit KV cache quantization. This aligns the compressed values with what attention actually needs, avoiding the collapse of naive rotation.
Results: On Qwen3 and GLM-4 (up to 358B params), OSCAR stays within 1–4 points of BF16 accuracy, while naive INT2 collapses to near zero. It cuts KV-cache memory by 8x, boosts throughput up to 7x at large batches, and accelerates decoding by 3x over BF16 — all deployable in SGLang and vLLM.
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
Paper: arxiv.org/abs/2605.17757
Project: oscar-quantize.github.io/
Code: github.com/FutureMLS-Lab/OSC…;
RotationZoo: huggingface.co/Zhongzhu/OSCA…
Our report: mp.weixin.qq.com/s/DKiYunmj_…
📬 #PapersAccepted by Jiqizhixin