⚙️ Tech lover | 🖥️ Software Engineer | 🕹️ Retro games to AI innovation

Joined November 2015
846 Photos and videos
Pinned Tweet
Claude Fable 5 just oneshoted this Super Pang game. I think we reached the peak with this test and have to find more complex ones for quick game challenges.
2
1
35
4,413
I just came across an interesting benchmark from @andonlabs called Blueprint-Bench 2. According to the description on the page, it works as follows: "It tests spatial reasoning by asking AI agents to convert apartment photographs into accurate 2D floor plans. Each agent processes 50 apartments sequentially, examining around 20 interior photos per apartment and generating a floor plan that shows room layouts, connections, and relative sizes." What I find particularly interesting is that Fable actually won this benchmark, while other Claude models performed well below SOTA models such as GPT-5.5 and even GPT-5.4, not to mention the Gemini models.
1
1
140
Peekaboo! There was Fable, there is no Fable!
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
2
75
Dominik Filkus retweeted
We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.
107
185
1,903
538,335
Even @t3dotchat has UX problems. This one loads forever. I do not know what is happening nowadays but seems everything is broken. Vibe coding ?
1
165
The @GoogleDeepMind Gemini Omni experience is so garbage that I can't even find words for it. Gemini app might be fine, but Flow is unusable.
1
1
267
Generations could be stuck too forever. Oh and Google Flow can't even work with mp3 files which are generated directly with Google Lyria 🤷‍♂️
121
Dominik Filkus retweeted
May 28
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.
3,687
8,628
67,437
15,240,629
Dominik Filkus retweeted
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
511
742
6,053
1,951,636
Dominik Filkus retweeted
Welcome to Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own. We spent the last 6 months making sure Flash is great for real world use cases. It's available everywhere now!
469
736
7,364
666,935
Dominik Filkus retweeted
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
7,989
11,150
150,232
27,570,655
This one looks awesome imo. Mixing styles with Krea 2 is easy and fun at the same time.
2
128
Dominik Filkus retweeted
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…
464
1,958
15,785
7,749,182
I've been playing around a bit with @krea_ai Moodboards. Here is my pool in the sky.
2
65
Dominik Filkus retweeted
May 12
this is Krea 2. our first foundation model, built completely from scratch for aesthetic diversity and stylistic control. learn more and get early access 👇
206
209
2,226
2,328,238
I almost forgot about this. Oh gosh, what a ride it was. There was the DeepSeek panic, the Blackwell production FUD, the smuggling to China FUD, Google TPUs, and of course all the other hyperscalers producing their own chips so they did not need Nvidia at all. Nope, none of them were enough to change the truth but I am always grateful for the discounts. $NVDA
Whenever Nvidia reaches ATH, this song comes to my mind. $NVDA @nvidia
2
129
This is getting worse...
‼️🚨 BREAKING: A new npm supply-chain attack uses a dead-man's switch. The payload plants a watcher on your machine that nukes your home directory the second you revoke the GitHub token it stole from you. The compromise happened today, across 42 official tanstack npm packages, 84 malicious versions in total. tanstack/react-router alone pulls more than 12 million weekly downloads. The attacker forked TanStack's repository and pushed a single hidden commit. From there, they tricked TanStack's own release system into signing the malicious packages as if they were the real thing. To npm, and to anyone checking the cryptographic proof of origin (SLSA provenance), the poisoned versions looked 100% legitimate. Maintainer Tanner Linsley confirmed the whole team had 2FA enabled. It didn't matter. This is the first documented npm worm in history that ships with a valid, signed certificate of authenticity, the same one defenders rely on to know a package wasn't tampered with.
107
Who could be behind this archaeopteryx codenamed model? Every time I try to generate an image on Arena AI, this comes to my screen and I think the quality is quite good.
1
139
Dominik Filkus retweeted
We taught two F.03 robots to clean a room and make a bed in under 2 minutes - fully autonomous.
669
1,115
8,346
1,388,029
Ok so our extended limits totally depend on Elon's actual mood. 🤷‍♂️
1
82