orcus108

orcus108

77 Photos and videos

Tweets

orcus108

@orcus108

12h

WHAT THE HELL is happening in AI? A 3B parameter model just put up coding benchmark scores in the same league as Claude Opus 4.5. 3 BILLION. The weights are on Hugging Face, anyone can test it. I genuinely don't know if this is a breakthrough or if the benchmarks are broken.

133

1,717

256,081

more replies

orcus108

orcus108

@orcus108

12h

The weights are public. Try it yourself. Give it the hardest coding or math problem you can think of and reply with the results. I wanna know if this thing is real. Paper: arxiv.org/abs/2606.16140 GitHub: github.com/WeiboAI/VibeThink… Hugging Face: huggingface.co/WeiboAI/VibeT…

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in...

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime....

arxiv.org

110

27,389

orcus108

orcus108

@orcus108

And it might not be benchmaxxed... (though im testing the model on some benchmarks which the original paper didn't run, just to confirm) x.com/Gurprets225/status/206…

Bruh

@Gurprets225

Replying to @orcus108

It's not gaming the benchmarks. If you read the paper, it focuses on math which is a narrow domain and easy to build datasets on. This is why it sucks on general domain knowledge, cuz it was trained for math only. It also has a 96% acceptance rate for leetcode problems after its training data which means it's doing well on problems it's never seen which points less to gaming the benchmarks. TL;DR: Not benchmark maxed, just trained on a narrow topic and it does very well on that topic and very poorly on anything else.

2,164

orcus108

orcus108 retweeted

orcus108

@orcus108

12h

133

1,717

256,081

orcus108

orcus108

@orcus108

12h

can someone explain why the cost graph is inverted?? makes no sense to me

Uzi

@uzairansar

22h

Fable on DeepSWE Disagree with this one tho. GPT 5.5 xhigh is great but def not as good as Fable was.

988

orcus108

orcus108

@orcus108

12h

"musk isn't stopping at 1T models" he's actually gonna make a Le Chaton Fat isn't he

Lisan al Gaib

@scaling01

13h

SpaceX acquired Cursor I expect SpaceX AI to be between Google and OpenAI by end of year Composer 2.0 was a very strong model for only 1T params, but Elon isn't stopping at 1T models

119

orcus108

orcus108

@orcus108

15h

some things im curious about / wanna do 1. build open code from scratch 2. understand how chip design works 3. build an LLM inference engine 4. start using Hermes agent (or any good agent) 5. understand edge/local AI 6. getting a nice community on X

orcus108

orcus108

@orcus108

18h

can someone at sarvam pls put the "powered by" on the second line so its one sentence per line @thedesignobsess @SarvamAI

orcus108

orcus108 retweeted

orcus108

@orcus108

Jun 15

OH MY GOD its happening @MistralAI has officially confirmed the upcoming release of Le Chaton Fat - 30T MoE with 256 experts - 1M context window - multimodal and multilingual - outperforms every other model (including Fable 5) on every benchmark

Shikhar

@xikhar

Jun 14

Le Chaton Fat

100

1,370

507,643

orcus108

orcus108

@orcus108

Jun 15

India's first rocket launched from a church. Its components were transported on bicycles and bullock carts. We remember how ISRO ended up but we forget how it started. Maybe the question isn't how India catches up in AI. Maybe it's what race India should run first.

orcus108

@orcus108

Jun 15

x.com/i/article/206659479973…

915

orcus108

orcus108

@orcus108

Jun 15

x.com/i/article/206659479973…

1,139

orcus108

orcus108

@orcus108

Jun 15

LMAO is he playing along or did he actually fall for it😭

Marc Andreessen 🇺🇸

@pmarca

Jun 15

They can’t keep getting away with this.

185

orcus108

orcus108

@orcus108

Jun 15

curious to see how it works with the smaller models. Could a fusion of, say, Qwen3-27B reach Sonnet/Opus/5.5 level?

OpenRouter

@OpenRouter

Jun 13

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

115

orcus108

orcus108 retweeted

orcus108

@orcus108

Jun 13

most people building on frontier AI have been operating on an assumption that broke last night. access to the most capable models isn't a utility you subscribe to - it's a privilege that can be revoked for hundreds of millions of users, developers, and startups with essentially no process. not just for people outside the US, but for everyone, including American nationals themselves. the scarier part isn't even that it happened. it's that there's no real mechanism preventing it from happening again, to any model, at any time, for any reason that gets dressed up as national security. the "just switch to open-source Chinese models" response misses something very important. right now, Chinese labs release open weights partly because it's a competitive weapon - it compresses American labs' margins and guarantees China access to capable models regardless of export controls. but that calculus changes the moment China reaches frontier parity. why keep giving away your best models when the US is hoarding its own? the latest big Qwen releases are already closed. the window where open-source AI functions as a global equalizer is closing from both ends at once, and faster than most people think. if you can't run the software on your own hardware, assume it can be taken away at any moment.

Anthropic

@AnthropicAI

Jun 13

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

3,026

orcus108

orcus108

@orcus108

Jun 14

how to become “cracked” at something

orcus108

orcus108

@orcus108

Jun 14

A LEGEND FORGED IN SILVER RESUMES IN RED❤️❤️

Formula 1

@F1

Jun 14

FIRST WIN IN RED ❤️ #F1 #BarcelonaGP

110

orcus108

orcus108

@orcus108

Jun 14

absolutely must must read

Aviral Bhatnagar

@aviralbhat

Jun 14

x.com/i/article/206607422608…

105