FOLLOWS YOU. Artificial Intelligence, Automation & Optimism. Everything I say is 100% serious...

Joined July 2022
7,851 Photos and videos
Pinned Tweet
opus 4.8 not off to a great start on Vending Bench Anthropic said "honesty" was one of the big improvements with opus 4.8 so more honest = sucks at business? yikes
Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6 and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort
18
4
81
10,395
Liquid AI added Ion Stoica to its Advisory Council as a strategic member. Stoica is a UC Berkeley computer science professor and co-founder of Databricks, Anyscale, and Arena, bringing deep experience in distributed systems, AI infrastructure, and scalable computing.
We are proud to announce that Ion Stoica (@istoica05) co-founder of @databricks, @anyscalecompute, and @arena, and UC Berkeley Professor of Computer Science, has joined Liquid AI as a strategic member of our Advisory Council. Ion will guide us on our growth journey as we build the efficient AI infrastructure and platform for a hardware-aware, physical AI future.
4
1
11
1,637
Replit launched Custom Instructions and Skills for Replit Agent, giving users a way to teach the agent their project conventions, preferences, and workflows. The update helps Replit Agent remember how users want projects structured, how brands should be represented, and what rules should apply across future builds.
AI agents are powerful, but they don’t remember your preferences. So you end up repeating instructions- How you structure projects. Your brand guidelines. You can now teach Replit Agent your conventions with Custom Instructions and Skills. It'll take them into account for every project automatically.
3
7
24
2,019
Google’s Gemini Omni Flash is expected to become available through APIs for image-to-video, text-to-video, and video editing.
Gemini Omni Flash is SOTA at image to video, text to video, and video editing : ) Excited to get this to developers in the API soon!
3
1
43
2,159
xAI launched the Grok Build Plugin Marketplace in beta, bringing built-in plugins directly into Grok Build terminal workflows. The marketplace lets developers install tools from partners like MongoDB, Vercel, Sentry, Cloudflare, and Chrome DevTools without leaving the terminal.
Jun 11
The Grok Build Plugin Marketplace is now in beta. Build with MongoDB, Vercel, Sentry, Cloudflare, and Chrome DevTools plugins from your terminal. Read more x.ai/news/grok-plugin-market…
3
7
24
1,290
Anthropic has an unusual leadership structure: CEO Dario Amodei reportedly has only one direct report, his chief of staff Avital Balwit. The rest of the executive team reports to Daniela Amodei, Anthropic’s president, who manages day-to-day operations and reports to the board. Dario focuses on strategy, research direction, culture, and long-term AI questions. Sam Altman reportedly has around half a dozen direct reports. Jensen Huang has said he has around 60. Dario says the setup lets him focus on the bigger picture. He reportedly spends a large amount of time talking to staff about Anthropic’s culture.
14
5
40
2,676
Frontier AI independence is expensive. Very expensive. Anthropic is reportedly pursuing its first data center leases and seeking financial backing from Google for the payments. Google is already deeply tied to Anthropic’s infrastructure strategy and has invested in Anthropic and provides major cloud computing support. Google is also involved in Anthropic’s custom chip strategy. Anthropic is reportedly buying around $200B in computing power from Google.
2
3
16
928
Gemini Omni Flash is now #1 in Text-to-Video. It is also tied for #1 in Image-to-Video. In Text-to-Video, it improved by 158 points over Veo 3.1 at
Jun 11
Exciting news: Gemini Omni Flash is now #1 in the Video Arena (both Text-to-Video and Image-to-Video)! For Text-to-Video this is a massive 158 pt improvement over Veo 3.1 (1080p) and a large 61 pt lead over the next best model, Seedance 2.0. Congrats @GoogleDeepMind for this huge milestone!
4
1
26
1,358
A new benchmark called Agents’ Last Exam (ALE) is testing whether AI agents are truly ready for real digital labor-market work. The benchmark includes more than 1,500 expert-sourced tasks. The tasks span 55 occupations. Models tested include Fable 5, GPT-5.5, Composer 2.5, and other frontier agent systems. The benchmark was created by researchers who previously worked on major evals like MMLU, MATH, CyberGym, and ExploitGym. Current agents can solve some real professional tasks. But on ALE’s hardest tier, every tested frontier agent scored 0% success. That includes Fable 5.
Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 this week. But is that really the case? Over the past many months, my group and collaborators have been building Agents' Last Exam (ALE), a benchmark designed to test exactly that claim on real digital labor-market work. My group and collaborators previously have created many of the benchmarks the field runs on, including MMLU, MATH, CyberGym, and ExploitGym. Today, I'm excited to share Agents' Last Exam (ALE): a rolling benchmark that measures whether AI agents can actually perform economically valuable work across a broad range of real-world domains. With ALE, we evaluated Fable 5, GPT-5.5, Composer 2.5, and other frontier agent systems across more than 1,500 expert-sourced tasks spanning 55 occupations. The result is both impressive and sobering. Today's agents can solve a meaningful fraction of professional tasks. But when we look at the hardest tasks, the ones requiring sustained reasoning, deep domain expertise, and reliable execution over long horizons, they are still far from human-level performance. On ALE's hardest tier, every frontier agent we tested, including Fable 5, achieved a 0% success rate. The age of useful agents is here. The age of truly job-ready agents is not. We hope Agents' Last Exam (ALE) will serve as a new guidepost and north star for developing agents capable of reliably performing economically valuable work across a broad range of domains. 🧵
12
3
46
3,378
Jeff Bezos’ AI startup Prometheus raised $12B in new funding at a roughly $41B valuation. The company is building AI tools for the physical economy, focused on helping engineers design and manufacture physical products faster. Prometheus launched in November with $6.2B in funding. Bezos serves as co-CEO alongside Vik Bajaj. The company is focused on AI for engineering, manufacturing, and drug design. Bezos says Prometheus is not building robots. The goal is to create tools that speed up the “invention loop.” Bezos described the vision as an artificial general engineer.
6
5
23
939
Wes Roth retweeted
OpenAI is updating the ChatGPT model picker to make model selection easier and more similar to the Codex experience. Users will keep access to the same main models and reasoning levels, except for the removal of thinking-light, which was used by less than 1% of paid users. The updated options include Instant, Medium, High, Extra High, and Pro.
We're making a small update to the model picker in ChatGPT! We know it's critical to a lot of people's work, and that we have a lot of paying users who care deeply about this one, so wanted to take some time to detail out the tweak. One important point upfront – you'll still have access to the same models and reasoning levels, besides the removal of thinking-light (used by less than 1% of our paid users). You'll see an updated list of options (similar to how Codex works): - Instant - Medium (Thinking-Standard) - High (Thinking-Extended) - Extra High (Thinking-Heavy) [for pro users] - Pro (with option to choose Pro-Standard or Pro-Extended) [for pro users] The intent is to make it easier to choose the balance of speed and effort that works best for your task. We also took into account community feedback to make sure: a) Thinking-heavy is easily accessible b) Pro standard and Pro extended are easily accessible c) We clearly communicate these changes Given that, here are some release notes detailing the updates - help.openai.com/chatgpt-rele…. Give it a try, it's rolling out today, and we're always open to feedback, we know it's important to get it right!
6
5
55
6,654
Wes Roth retweeted
Google released DiffusionGemma, an experimental open text-generation model under the Apache 2.0 license. The model explores a faster way to generate text by producing whole blocks in parallel instead of generating one token at a time. Key details: 🔹The model can generate 256 tokens in parallel. 🔹Google says it can deliver up to a 4x speedup on standard accelerators. 🔹It can reach 1,000 tokens per second on a single NVIDIA H100. 🔹It can reach 700 tokens per second on an NVIDIA GeForce RTX 5090. 🔹It is a 26B Mixture of Experts model that activates only 3.8B parameters during inference. 🔹When quantized, it can fit within 18GB VRAM, making it usable on high-end consumer GPUs.
Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
6
3
33
2,421
Wes Roth retweeted
Dario Amodei published a new essay titled “Policy on the AI Exponential,” arguing that AI is advancing much faster than governments and policy systems are built to handle. He argues that AI models have gone from weak coding ability to writing much of the code at major AI companies in only a few years. He says continued scaling could lead to “Powerful AI,” described as a “country of geniuses in a datacenter.”
Today I'm publishing a new essay, Policy on the AI Exponential. AI is progressing extremely fast—much faster than the policy process was built to handle. The essay lays out where I think the technology is now, and the action needed to close the gap: darioamodei.com/post/policy-…
11
4
34
3,259
Wes Roth retweeted
NotebookLM will soon support textbooks as a source, expanding the types of materials users can bring into Google’s AI research and learning workspace.
GOOGLE 🔥: NotebookLM will soon support textbooks as a source! Google Play Books and Text Books, all there. h/t @thomas_gmry
4
3
32
2,327
Wes Roth retweeted
OpenAI is reportedly in talks to lease a proposed 10-gigawatt data center campus on federal land in Ohio, with possible financial backing from Nvidia. The project could become one of the largest AI infrastructure deals ever, with an estimated build cost of at least $500 billion.
3
4
27
1,704
Runway and Lionsgate are expanding their existing partnership. The new program will focus on developing original IP. Lionsgate is reportedly taking an equity stake in Runway. The companies also plan to create AI-generated short-form episodic projects. The work may involve Lionsgate’s existing film and TV library.
Jun 11
Today, we’re deepening our partnership with Lionsgate with a slate of new initiatives, including a joint development program focused on creating original IP together. Learn more at the link below.
3
1
12
1,001
Anthropic launched Claude Corps, a national fellowship program that connects early-career people with U.S. nonprofits. The program will train 1,000 people to use Claude and pay them to apply AI toward nonprofit missions.
We’re launching Claude Corps, a national fellowship program matching people early in their careers with US nonprofits. We'll teach 1,000 people to use Claude, and pay them to use AI to advance their hosts’ missions. anthropic.com/claude-corps
5
2
16
1,433
Google is reportedly in talks with Samsung to manufacture part of its next-generation AI chip, codenamed Icefish. According to the report, Google plans to split production across multiple partners, with TSMC building the main compute die and Samsung potentially supplying a memory-related component using its advanced 2nm process.
4
2
26
1,100
OpenAI and Oracle are making it easier for Oracle Cloud customers to access OpenAI models and Codex through their existing Oracle cloud commitments. The update lets eligible customers use Oracle Universal Credits for OpenAI models and Codex without creating a separate purchasing path.
openai.com/index/openai-on-o… OpenAI 🤝 your Oracle cloud commitment
5
5
36
1,817
OpenAI is reportedly considering major price cuts for its AI products as competition with Anthropic intensifies. The company is weighing lower token prices, which would reduce the cost of using OpenAI models through APIs and other usage-based products.
7
3
39
1,670
OpenAI has hired Clint Gibler to help lead its cyber work alongside Michael Aiello, signaling a deeper push into AI-powered cybersecurity. Gibler says AI is changing both how software is written and how software is secured, as coding agents write more code and vulnerabilities are discovered and exploited faster.
Career update: I’ve joined @OpenAI to lead Cyber with @michaelaiello. Why I joined, and what we’ll be building: It’s clear that AI is fundamentally changing how software is being written and secured. Coding agents are writing the majority of code for many developers, software is getting shipped more quickly, and vulnerabilities that were latent for 20 years are being discovered at a rapid pace. The time to bug discovery, and exploitation once discovered, are trending down (H/T @EppSecurity and @gadievron). I believe we have an unparalleled opportunity to fundamentally 𝘪𝘮𝘱𝘳𝘰𝘷𝘦 cybersecurity in ways that were previously impossible. (H/T @bubblewire’ BSidesSF keynote on reasons for optimism) Over 6 years at @Semgrep, I had the privilege of working with an amazing team building what has become the most popular open source security code scanning tool in the world, that many companies have built their application security program around. Now, at @OpenAI, I’m thrilled to be a part of a company helping shape how software is written, and how security work gets done. It is a massive opportunity, and responsibility, and I don’t take that lightly. Here are my current thoughts about where things are headed: 𝐑𝐞𝐬𝐢𝐥𝐢𝐞𝐧𝐭 𝐛𝐲 𝐝𝐞𝐬𝐢𝐠𝐧. Defenders are not going to win playing bug whack-a-mole. We need to systematically eliminate classes of vulnerabilities, via generating secure code and streamlining the detect → validate → fix process. 𝐀𝐮𝐠𝐦𝐞𝐧𝐭 𝐚𝐧𝐝 𝐞𝐦𝐩𝐨𝐰𝐞𝐫 𝐩𝐞𝐨𝐩𝐥𝐞. We should build models and tools that give defenders “superpowers,” enabling them to be more ambitious in the scope they tackle, shift from being reactive to proactive, and allow them to automate the drudgery so they can focus on the highest leverage work. 𝐒𝐞𝐜𝐮𝐫𝐞 𝐭𝐡𝐞 𝐜𝐨𝐦𝐦𝐨𝐧𝐬. The world runs on open source software. OpenAI has already spent $Ms finding and patching vulnerabilities in the most popular and widely run software, including browsers, operating systems, and core libraries. More on this soon. We’re also working on helping secure critical infrastructure. 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲 𝐚𝐧𝐝 𝐩𝐚𝐫𝐭𝐧𝐞𝐫𝐬. Securing the world is a community effort. I’m looking forward to partnering with cybersecurity vendors, researchers, practitioners, governments, and more to do together what we can’t do alone. 𝐓𝐢𝐦𝐞 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝. Tactically, here are some domains I’m excited about: - Finding, validating, and reliably patching software vulnerabilities at scale. - Eliminating classes of vulnerabilities and making software resilient by design. - Giving broad access to the best cyber models to empower defenders, not just to a select few. - Creating and sharing Skills and playbooks that help in many security domains. - Building platforms that enable defenders to easily orchestrate security work. - Making enterprise agents safe and reliable. Time to build 😎 — What would help you most? What should we build? Let me know.
9
32
2,024