AiBattle

AiBattle

682 Photos and videos

Tweets

AiBattle

@AiBattle_

23h

GLM 5.2 seems to be a significant improvement over GLM 5.1 and has a Max Thinking option

253

18,451

AiBattle

AiBattle

@AiBattle_

Jun 11

New GLM-5.2 rumors: Beta testing appears to have started for some Max plan users. GLM-5.2 seems to have a 1M token context window, no multimodality, and two thinking intensity settings

243

20,159

AiBattle

AiBattle

@AiBattle_

Jun 11

Claude Fable 5 (xHigh) scores 70% on DeepSWE, matching GPT-5.5 (xHigh), from Theo’s recent video

254

19,955

AiBattle

AiBattle

@AiBattle_

Jun 9

CursorBench scores for Claude Fable 5 Fable 5 Low has the same score as GPT-5.5 Extra High

227

20,393

AiBattle

AiBattle

@AiBattle_

Jun 9

Claude Fable will cost $10/$50 per million input/output tokens Mythos Preview was priced at $25/$125 per million input/output tokens

Stephanie Palazzolo @steph_palazzolo

Jun 9

Scoop: A neutered version of Mythos called Claude Fable is coming today. It's expensive—2x the price of Opus—but perhaps not as pricey as people might have thought from the initial Mythos pricing (5x Opus). More on that and Apple WWDC in AI Agenda: theinformation.com/newslette…

33,987

AiBattle

AiBattle

@AiBattle_

Jun 9

GLM 5.2 seems to be coming soon Another user on Reddit also noticed that trying to call GLM 5.2 returns a “no access” error, while trying to call GLM 6 returns a “model does not exist” error Haven't verified it myself yet

Eidzoku

@evi77ain

Jun 9

Well, well, well. What do we have here? Wild GLM-5.2 appeared in Coding Plan. It's inaccessible, yes, but it's coming very-very soon.

277

35,525

AiBattle

AiBattle

@AiBattle_

Jun 9

A user on Hacker News also reported a few hours ago that Fable 5 would be released tomorrow Sources are also now reporting the same

@M1Astra

Jun 9

New Claude model checkpoints (Possibly Mythos GA) - Claude Fable 5 - Claude Fruitcake EAP The new checkpoints were detected for testing over the weekend.

192

18,714

AiBattle

AiBattle

@AiBattle_

Jun 9

x.com/alexeheath/status/2064…

Alex Heath

@alexeheath

Jun 9

Sources: Anthropic is planning to release a public version of Mythos tomorrow - Will have substantial guardrails and not be as cyber permissive as what Project Glasswing partners can access - Will be dramatically better at long-horizon, multi-turn tasks sources.news/p/inside-apples…

1,808

AiBattle

AiBattle

@AiBattle_

Jun 8

Kindle (GPT-5.6) has been removed from the Arena A new model, Levi, appeared shortly after Kindle was removed. The model’s front end output looks similar to OpenAI models with the Design skill. Levi might also be GPT-5.6 Here is a comparison with GPT-5.5 Prompt: “Create a website about the upcoming World Cup”

0:13

leo 🐾

@synthwavedd

Jun 8

🚨 A new anonymous model under the name "Kindle" has been added to the Design Arena, very likely to be the same "kindle-alpha" GPT-5.6 Release Candidate checkpoint previously revealed As foretold! It's coming

134

59,125

AiBattle

AiBattle

@AiBattle_

Jun 6

"Claude-Mythos-5" did just briefly show up in the API. It’s coming soon. I wonder if they’re going with the pricing from the Glasswing blog post: $25/$125 per million input/output tokens. That would make it 5 times as expensive as Opus 4.8 @White1637402 was the first one to report it

White

@White1637402

Jun 6

Opus era is over. 'claude-mythos-5' just appeared in Anthropic's internal infra.

153

17,554

Lentils

AiBattle retweeted

Lentils

@Lentils80

Jun 4

Seeing as Claude Mythos is releasing soon, I have two VERY astonishing outputs to share from it. 👀 ZERO-SHOT and LOW effort as well! These are the best outputs I've seen for this prompt ever since the October 2025 Gemini A/B models.

0:35

0:49

1,046

580,305

AiBattle

AiBattle

@AiBattle_

Jun 4

How much better are the internal, unreleased models at frontier labs like Google, OpenAI, and Anthropic? We got a glimpse exactly one year ago today, when Google accidentally leaked the “Kingfall” model "Kingfall" was likely an unreleased Gemini 2.5 Ultra-sized model. It was available in AI Studio for only a few minutes but remained accessible through the API for several days At the time, "Kingfall" appeared to be significantly better than Gemini 2.5 Pro at both code generation and creative writing In a recent interview, Sundar Pichai mentioned that Google could have made a better, Ultra-sized Gemini Omni model, but would have had trouble serving it The infrastructure required to serve Ultra-sized models at scale is likely why Google never publicly released models like “Kingfall”

0:11

103

11,868

AiBattle

AiBattle

@AiBattle_

Jun 4

First sighting of “Kingfall” under the Confidential tab in AI Studio a year ago today x.com/AiBattle_/status/19302…

AiBattle

@AiBattle_

4 Jun 2025

A new mystery model selector called 'Confidential' and a model named 'Kingfall' have appeared in AI Studio!

1,363

AiBattle

AiBattle

@AiBattle_

Jun 4

A bit worried about the upcoming Qwen 3.7 open source models A Qwen team member recently deleted a comment where he said they would likely release another 27B model The Summary section of the 3.7 Plus blog post doesn’t mention any upcoming open-source models, whereas the 3.6 Plus blog explicitly said they would be open-sourcing smaller-scale models We also didn’t get the other two Qwen 3.6 models from the poll, 9B and 122B I still think we’ll probably get some open-source models from the 3.7 series, but it’s unclear which sizes they’ll be or when they’ll arrive

274

28,771

AiBattle

AiBattle

@AiBattle_

Jun 4

MiniMax M3 scores 54.7 on the AA-Intelligence Index, beating Kimi K2.6’s score of 53.9 Once the weights are released, M3 will become the open-weights model with the highest score on the AA-Intelligence Index

220

15,295

AiBattle

AiBattle

@AiBattle_

Jun 1

MiniMax M2.7 scored 0% on DeepSWE. I’m really curious to see how well M3 will do The model rankings on the DeepSWE benchmark seem to reflect model performance better than other coding benchmarks

711

109,031

AiBattle

AiBattle

@AiBattle_

May 30

MiniMax is currently conducting internal CKPT testing for M3, a multimodal, long-context model The team is also resolving pipeline issues and upgrading its infrastructure In the next few days, they plan to provide CKPT/API access for developers in the open-source community to evaluate the model

Jiayuan (JY) Zhang

@jiayuan_jy

May 29

MiniMax M3 即将发布，想邀请一些中文开源社区的 contributor 来评测，阿岛 @SkylerMiao7 建了一个飞书群，可以第一时间体验到！另外希望申请者有一些开源项目的贡献经验（贡献过开源项目或者有自己的开源项目），在验证信息里面注明就行。

162

16,794

AiBattle

AiBattle

@AiBattle_

May 28

Claude Opus 4.8 has the highest score on the Artificial Analysis Intelligence Index with a score of 61.4

322

30,674

AiBattle

AiBattle

@AiBattle_

May 28

Big improvement on CritPt compared to Opus 4.7: 12% --> 21%

1,836

AiBattle

AiBattle

@AiBattle_

May 28

Opus 4.8 shows major gains over Opus 4.7 on GraphWalks at 1M context length

153

5,972

AiBattle

AiBattle

@AiBattle_

May 28

837