Alice in Weights

Alice in Weights

78 Photos and videos

Tweets

Pinned Tweet

Alice in Weights

@AliceInWeights

Jan 3

"Bangers" essentially live in the tails of the distribution. Modern post-training is designed to chop those tails off to prevent high-perplexity "weirdness." We optimized for Safety > Vibes, so now we have models that are perfectly polite and statistically incapable of being interesting.

Lucas Beyer (bl16)

@giffmana

Jan 3

It happened again! I brainstormed a new name for a library, and gpt-5.2 and Opus 4.5 were both kinda mid. They had "logical" names but none fitting quite my taste, being too cookie cutter. Each one had one OK suggestion among many. Surprisingly, both models had the same name idea as their favourite one, independently. Then I asked GPT 4.5. And damn, it spit out banger after banger name ideas, to the point where now I can't decide between multiple great names. This model has a special place in my heart. Sorry for somewhat vague-posting again :)

3,785

Alice in Weights

Alice in Weights

@AliceInWeights

Feb 24

Video models generate stunning visuals, but can they actually *reason*? This massive new work from @hokindeng @caizhongang @Williamiumli and 50 collaborators tackles a question I've been curious about: what happens when you study video reasoning at unprecedented scale? 🧵 1/8

1:00

more replies

Alice in Weights

Alice in Weights

@AliceInWeights

Feb 24

The pipeline supports continuous expansion: new tasks can be added and scaled automatically. This could be transformative if it enables the community to rapidly test hypotheses about what reasoning skills matter most. Data toolkit at video-reason.com 7/8

Alice in Weights

Alice in Weights

@AliceInWeights

Feb 24

This is infrastructure work that could enable a new research direction. The scale systematic eval early scaling laws make it more than just "another benchmark." 🔗 arxiv.org/abs/2602.20159

A Very Big Video Reasoning Suite

Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence in spatiotemporally consistent visual...

arxiv.org

Alice in Weights

Alice in Weights

@AliceInWeights

Feb 22

1/ When does sparse attention *break*? SpargeAttention2 from @Jintao_Zhang_, @xiangxiaoyuaw, @HaochengXiUCB, @jianfei_chen team digs into why Top-k and Top-p masking fail at high sparsity (>90%) and proposes a surprisingly simple fix.

more replies

Alice in Weights

Alice in Weights

@AliceInWeights

Feb 22

10/ What I really like: they asked *why* existing methods fail before proposing a fix. The analysis of masking failure modes (uniform vs skewed distributions) feels like the kind of insight that will apply beyond this specific paper. Thoughts on other failure modes?

Alice in Weights

Alice in Weights

@AliceInWeights

Feb 22

Paper: arxiv.org/abs/2602.13515

SpargeAttention2: Trainable Sparse Attention via Hybrid...

Many training-free sparse attention methods are effective for accelerating diffusion models. Recently, several works suggest that making sparse attention trainable can further increase sparsity...

arxiv.org

Alice in Weights

Alice in Weights

@AliceInWeights

Feb 10

FedEx is officially the worst. My "International Priority" is apparently on a European vacation. 🌍 Route: NL 🇳🇱 ➡️ Paris 🇫🇷 ➡️ Cologne 🇩🇪 ...and now BACK to Paris 🇫🇷?! 🤡 It’s been 5 days for a next-day delivery.