ESPN

ESPN

11 Photos and videos

Tweets

Roey Ben Chaim retweeted

ESPN

@espn

Jun 11

OmG.

755

22,982

181,776

3,563,597

Roey Ben Chaim

Roey Ben Chaim

@roeybc

Jun 11

I think i saw Larry David smile #knicks

1,067

Roey Ben Chaim

Roey Ben Chaim

@roeybc

Jun 9

I've been following the team for a while now - and I love the walkthrough feature. Reading code was always harder than writing it, and now agents are writing it in copious amounts - Jimmy's solution for this is super creative.

Jimmy Koppel

@jimmykoppel

Jun 8

If AI’s coding 100x faster, why aren’t you shipping 100x faster? I’ve interviewed dozens of builders to find out. Here’s what’s slowing you down

2,570

Boris Cherny

Roey Ben Chaim retweeted

Boris Cherny

@bcherny

Jun 8

Seeing a number of benchmarks showing Opus is the best model for long-running work. Five tips for running Opus autonomously for hours/days: 1. Use auto mode for permissions, so Claude doesn’t ask for approval 2. Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done 3. Use /goal or /loop, to nudge Claude to keep going until it’s done 4. Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app) 5. Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work

Rishi Desai

@rishi_desai2

Jun 5

Can coding agents stay coherent over a 1 billion token budget? Can they build Slack from scratch? Rewrite a JAX codebase in PyTorch? Build a C compiler in Rust? Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.

313

280

3,477

641,269

Roey Ben Chaim

Roey Ben Chaim

@roeybc

Jun 5

Incredible work here on long horizon tasks: from solving reward hacks to verifying full stack tasks.

Rishi Desai

@rishi_desai2

Jun 5

107

Roey Ben Chaim

Roey Ben Chaim

@roeybc

May 22

skills are great at fetching the right context at the right time. but not all context is good for you 😈 come watch @mbrg0 @ blackhat this summer to see what we found

Michael Bargury

@mbrg0

May 22

excited to speak about our agent detonation chamber this summer at #BHUSA! how do you 'scan' txt for 'security badness'? not w wishful analysis by an llm judge what we really want is: what will this thing cause my agent to *DO*? ft/ francesco montorsi @lana__salameh @roeybc

ALT BHUSA agenda

Roey Ben Chaim

Roey Ben Chaim

@roeybc

May 20

If you wanna help shaping AI for Science, there's no better place than this initiative 👇

Steven Dillmann

@StevenDillmann

May 20

📣 Announcing Terminal-Bench Science: benchmarking AI agents on real scientific workflows – now open for task contributions👇 tbench.ai/news/tb-science-an… @AnthropicAI, @OpenAI, and @GoogleDeepMind use Terminal-Bench to evaluate AI on coding tasks. We're now extending it to scientific workflows. 1/6🧵

Amitay Gilboa

Roey Ben Chaim retweeted

Amitay Gilboa

@GilboaAmitay

May 11

We got an 8-figure acquisition offer 2 days after launch. We said no, because the problem we're solving is worth way more than that. It’s 2026, but teams are only getting lonelier, and context is still the problem. The issue isn’t intelligence. Your team has plenty of that. It’s shared memory and context, the thing that makes 10 A-players feel like 1. That’s what we’ve solved with @playdotfast, while making work more fun. We're killing traditional SaaS, and believe you me, we're leaving no holds barred.

1:38

326

466

2,358

4,015,041

Georgia Channing

Roey Ben Chaim retweeted

Georgia Channing

@cgeorgiaw

Apr 29

🤗🤗🤗introducing Hugging Science -- the home of AI for science 🤗🤗🤗 open models and datasets are the powerhouse of science (see the PDB), but finding the models and data you actually need for your breakthrough is hard af you shouldn't need to scrape arxiv, own your own wetlab, fight a custom HDF5 parser, build a fusion stellarator, and beg for compute before you've trained a single epoch so we're changing that we've put all the best science on @huggingface in one place: - 78GB of genomics data - 11TB of PDE simulations - 100M cell profiles - 9T DNA base pairs - 13M molecular trajectories - 400k medical QA pairs and much more, all open, and all ready for training ( you can also now filter and search by domain, task, and keyword) we've put together all the biggest releases from our partners at NASA, Google, OpenAI, Meta FAIR, Arc Institute, Ginkgo, SandboxAQ, Proxima Fusion, NVIDIA, Ai2, OpenADMET, InstaDeep, Future House, Polymathic AI, LeMaterial, Earth Species Project, Merck, and Eve Bio if you're not sure where you fit in -- work on open challenges for problems that matter: including fusion stellarator design, ADMET, antibody developability, multilingual medicine, catalysis and materials, and scientific reasoning. we're already changing how science gets done: a fusion startup needed a benchmark for stellarator plasma confinement that didn't exist. @proximafusion shipped ConStellaration on Hugging Science: a leaderboard, dataset, and eval metrics, all in one place. a drug discovery team wanted to predict hPXR induction. OpenADMET put up a blind challenge: 11,000 compounds assayed at Octant, 513 held out, two tracks (pEC50 structure). Anyone in the world can train and submit. an antibody team at @Ginkgo released GDPa1, a developability dataset for stability, manufacturability, and immunogenicity prediction, with a live leaderboard scoring every submission. if you know a problem the ML community should be working on, let us know. make a challenge! this is about putting all the tools for solving science in one place. so we can hillclimb! → huggingscience.co

0:21

350

1,808

198,367

Christoffer Bjelke

Roey Ben Chaim retweeted

Christoffer Bjelke

@chribjel

Apr 29

Ai generated prs be like

212

4,143

113,040

Brendan Dolan-Gavitt

Roey Ben Chaim retweeted

Brendan Dolan-Gavitt

@moyix

Apr 23

A principle of security is that you should never assume you're the only one who can figure something out – and that therefore, it's best to be open about tools, findings, and methods.

XBOW

@Xbow

Apr 23

Anthropic’s Mythos raised the bar for AI vuln detection but kept it invite-only. GPT-5.5 is OpenAI’s answer, and it’s open to all. We had early access. Ran the benchmarks. Blackbox GPT-5.5 already beats whitebox GPT-5. Best pentesting model we’ve tested. Read our analysis: bit.ly/48OX7v6

0:15

257

44,207

Xiangyi Li

Roey Ben Chaim retweeted

Xiangyi Li

@xdotli

Apr 23

SkillsBench is now cited by HY-3 model card. Congrats to @TencentHunyuan on the launch and kudos to the SkillsBench team / community! We've made a lot of improvements to the tasks, codebase, tooling in the past month, based on feedbacks from users and lab partners. We will also share updated leaderboard with more models and agent harnesses soon, stay tuned!

1,584

Vaibhav Gupta

Roey Ben Chaim retweeted

Vaibhav Gupta

@vaibcode

Apr 22

Replying to @MKelner @michael_chomsky @dexhorthy @Vtrivedy10

just fix the problems your users have as fast as possible. if you need to, then build a harness, if it works out of the box, use the existing code. its just stacked while loops.

3,840

Roey Ben Chaim

Roey Ben Chaim

@roeybc

Apr 22

ok this keeps on happening so for the 1000th time: Microsoft Teams has a web client - YOU DO NOT NEED TO RUN A SCRIPT TO JOIN A MEETING.

Michael Feng

@fengtality

Apr 22

Almost got hacked this morning - here's a replay of what happened: 1. A VC whom I've met in person reached out for a catchup 2. She sent me a Microsoft Teams link a few min ahead of the meeting 3. When I joined, it asked me to download update script 4. Got a funny feeling and ended the call immediately 5. Claude inspected the file and it was indeed malicious Not sure if this person was just hacked or a bad actor, but I wanted to post this as a PSA. Stay safe.

145

Roey Ben Chaim

Roey Ben Chaim

@roeybc

Apr 22

Someone needs to start curating these things…

Asuka Zheng🎀

@VoidAsuka

Apr 21

asked claude to fix an nccl comms error between gpus. it replaced nccl with http. the gpus are now emailing each other their gradients. problem solved, technically. i have never been more impressed.

Roey Ben Chaim

Roey Ben Chaim

@roeybc

Apr 21

yeah no codex is really good now

Roey Ben Chaim

Roey Ben Chaim

@roeybc

Apr 22

wait why is it using perl?

Roy Zalta 🐈

Roey Ben Chaim retweeted

Roy Zalta 🐈

@RoyZalta

Apr 21

@steipete myself (@RoyZalta) & @Michaelliav99 are hosting a @Microsoft × @openclaw 🦞× @NousResearch 🤖LIVE event in Tel Aviv 🇮🇱🔥 Would love your support, even a quick 5-minute drop-in call to congratulate the team 🙌 #openclaw @openclaw #HermesAgents #AgenticAI #AI #GenAI #Microsoft #TechEvents #TelAviv #Startups #AICommunity

363

Kobe

Roey Ben Chaim retweeted

Kobe

@kobe0938

Apr 15

This is the spirit of Silicon Valley. Let me tell you a story. On 12/21/2025, Xiangyi called me with a pitch: let's gather a team and build a new benchmark — SkillsBench — following the community-contribution model of Terminal-Bench (a project we're both contributors on). We'd reuse the "harbor" infra so we wouldn't have to reinvent the wheel. He said skills were just recognized by Anthropic and this was the perfect timing. So I asked: what can you offer contributors in return? "Authorship on an ICML 2026 paper." I asked how many citations we could realistically expect. We looked at comparable work like MCPBench — only a handful of citations. And honestly, at that point, Benchflow was nothing (bear with me, @xdotli). No successful project. No track record. This was the first paper Xiangyi had ever led — or ever written. No professor advising. No experience managing a large-scale open source community — and we all know how hard that is. People sign up and never contribute. Deep down, I was ready to say no and spend my time on something with a safer payoff. But then Xiangyi said something that stuck with me: "If we somehow make it, I know how to make it go viral on X." On paper, there was no reason to believe him. But it wasn't what he said — it was how he said it. There was something in his voice that night. No hesitation. No hedging. Just raw, almost irrational conviction that this was going to work. I'd talked to plenty of people with ideas before, but this was different — this was founder energy. The kind where someone has already decided the outcome and is just looking for people willing to run alongside them. So I took a leap of faith. I decided to bet on the person, not the project. That's why I joined SkillsBench and Benchflow. And it did go viral. @garrytan and many others reposted us. We hit a few million views. The paper already has 27 citations. He personally got 3k followers. And many more projects, like ClawsBench are on the way. Fast forward to today — Xiangyi is turning down multiple 10M acquisition offers and 1M personal compensation to keep pushing Benchflow's vision. From a guy with no paper, no track record, and nothing but conviction on a December phone call — to building something unicorn companies want to buy. In 3 months. This is the story of SkillsBench and Benchflow. If you're determined enough, the world will rearrange itself around you. Go for it.

Xiangyi Li

@xdotli

Apr 15

Just logged in on @benchflow_ai LinkedIn and wow we are popular We are a data and environment lab 📐We turned down multiple 8 figure acquisition offers from unicorn companies and 7 figure compensation for me to push benchflow's vision. If environment and benchmark is your thing, I want to chat with you! reply / dm and let's set up a time 🎉

1,909

Liran Tal

Roey Ben Chaim retweeted

Liran Tal

@liran_tal

Apr 11

your daily reminder for npm security best practices

7,368