Kenny Workman

Kenny Workman

1 Photos and videos

Tweets

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

Jun 11

Introducing EpiBench, an agentic benchmark for practical epigenomics analysis. 106 evaluations span CUT&Tag/CUT&RUN, ATAC-seq, ChIP-seq, and DNA methylation workflows. The best agent–harness pair passes 45.0% of evaluations. Evaluations reflect the assay outputs scientists use in practice. A task may depend on alignment files, peak calls, methylation tables, QC metrics, sample metadata, genomic annotations, or downstream summaries. Solving them requires a mix of coding, data analysis, and scientific judgment. Ground truth is hard to define even for short-horizon scientific tasks. Alternative task interpretations can produce multiple plausible answers. Candidate tasks are hardened through manual quality control. We remove prompts that over-specify the method, answers that can be solved with general literature knowledge, and ground truths that fail to reproduce under peer reproduction. Short-horizon tasks are the current frontier for scientific agents in epigenomics. Before models can own deeper biological reasoning, they need to become reliable at local assay-specific decisions.

105

7,699

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 29

After several months of analyzing model trajectories on SpatialBench, we found issues in a subset of evals. Some tasks depended on analysis decisions not specified in the prompt. Others had grader thresholds that were too narrow, rejecting valid solution paths the original domain expert had not considered. We ran two rounds of independent expert attempts without access to solutions. This produced SpatialBench Verified: a 115-problem gold-standard subset of the original 159 evals where expected answers can be reproduced from only the prompt and associated data. Model ordering is largely preserved, but scores increase 11.6pp on average. Verifiability in biology is hard because correct answers often depend on tacit analysis choices. Our results suggest independent human verification should be a core part of benchmark construction.

2,002

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 28

Replying to @void_bio

This is a large gain on this benchmark. Consider data from previous releases: blog.latch.bio/p/new-frontie….

New Frontier Models Are Faster, Not More Reliable, at Spatial Biology

Overall accuracy for GPT-5.5 and Opus 4.7 remains flat on SpatialBench. Scientist-reviewed trajectories reveal persistent gaps in assay-aware biological judgment.

blog.latch.bio

595

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 28

Biology is the next agentic frontier after coding. Anthropic is aggressively improving their models on routine data analysis with careful attention to nuances of different assay types. Opus 4.8 is noticeably better at single cell / spatial analysis. We have already rolled it out to customers across pharma and academia. Cool to see our benchmarks on the system card.

Claude

@claudeai

May 28

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

ALT Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

285

21,871

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 27

Introducing SpatialBench-Long, a benchmark for long-horizon spatial biology. Agents must recover biological claims from raw data and realistic experimental context without prescribed methods. 24 evals span primary tumors, organoids, xenograft models, lineage-tracing systems, and aging/intervention biology. The best agents score 11.1%.

8,480

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 27

Figuring out how to benchmark agents on realistic biology research has quickly become one of my favorite types of engineering work. You work with scientists to get to the core of some biological claim, precisely assembling raw data/prior literature/experimental context in a little 'world' for an agent to stumble around in - only for its behavior to challenge what you thought to be true and to force you to deeply introspect empirical human behavior. Doing this well gets considerably more difficult as we better approximate and climb long horizon scientific work: the type one might publish in the results section of a paper or build a drug program around. Been working on a project in this direction for the past few months and excited to drop tomorrow.

4,461

Naval

Birendra Kumar Saye retweeted

Naval

@naval

May 26

The new competition isn’t Humans vs AI. It’s Humans with AI vs everyone else.

986

1,856

14,634

518,646

Andrej Karpathy

Birendra Kumar Saye retweeted

Andrej Karpathy

@karpathy

May 19

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

7,989

11,150

150,232

27,570,661

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 15

If these machines become as ubiquitous as sequencers at the same time the labs start post training seriously in biology we're in for a scientific revolution

7,122

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 12

Agents calling tools now generate more revenue on Latch than scientists clicking buttons. From native deployments but also harnesses like Claude Code, Codex, Cursor across Pharma, biotech, academic labs. Product roadmap increasingly concerned with exposing compute intensive operations and experimental context to agents while designing new ways for humans to steer. Tool calls can dispatch a TB of data over a dozen computers. Errors in eg. parameter selection, infra failures, tool selection more expensive in time/cost than familiar agentic domains. Very early and exciting product space.

4,091

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 12

Are frontier models improving at single cell biology? We revised scBench with additional rounds of scientist review and generated new benchmark data for Opus 4.7, GPT 5.5, and Gemini 3.1. On accuracy, Gemini 3.1 roughly doubles, Opus 4.7 shows modest improvement, and GPT 5.5 shows little to no improvement. On latency, GPT 5.5 improves substantially, Opus 4.7 shows little change, and Gemini 3.1 worsens by ~5x.

6,439

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

May 6

Gave a talk to Machine Learning @ Berkeley on benchmarking frontier models on spatial biology. Why understanding how assays work is important, what verifiability might look like with messy biology infrastructure challenges running agentic evals at scale.

43:22

179

14,445

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

Apr 29

New frontier models are not meaningfully improving at spatial biology. Overall accuracy for GPT-5.5 and Opus 4.7 remains flat on SpatialBench. Scientist-reviewed trajectories reveal persistent gaps in assay-aware biological judgment.

6,296

Kenny Workman

Birendra Kumar Saye retweeted

Kenny Workman

@kenbwork

Apr 9

This is the year of agents in biology. What you're seeing in code is already unfolding in molecular data analysis, reorganizing workflows in basic research and drug development. Path forward is focused benchmarking engineering scoped to specific types of assays. Just as coding agents had to reliably write JavaScript before they could build a browser, biology agents must first learn to accurately process and interpret concrete measurements, (eg. spatial assays), before they can reason about disease, drug mechanism, or patient response. Our roadmap reflects this progression: procedural skill in analysis -> emergent biological reasoning -> synthesis across data types, translational context, and realistic ambiguity. Towards systems that can eventually support expensive, high-stakes decisions in drug programs or research projects. Diffusion in biology is slower than software and needs to be thought through carefully. We work directly with the teams building measurement tech (eg. TakaraBio and Vizgen) and package assay-specific agents alongside their kits and instruments. Scientists complete sample preparation, then use these tech-specific agents to move from raw data to answers and figures. Our partners white-label our platform; we do not run a direct biotech sales motion. Now hiring rapidly across major assay categories, prioritized by which we believe will contribute most to the area under the molecular data curve over the next several years - Spatial - Single Cell - Epigenomics - Genomics - Perturbation/Screening - Diagnostics Looking for talented scientists and engineers with strong foundations in theory and deep experience in these areas to help us build scientifically accurate agents.

153

77,230

Chrysothemis Brown

Birendra Kumar Saye retweeted

Chrysothemis Brown @themis_brown

Feb 3

Very happy to share our new study defining the ontogeny of Thetis cells and the developmental cues that shape their early-life wave of differentiation. nature.com/articles/s41586-0… Beautiful work led by exceptional @HHMI Gilliam fellow @YoselinAPI and @MSK postdoc Tyler Park 🧵 1/

Ontogeny and transcriptional regulation of Thetis cells

Nature - A study of antigen-presenting cells called Thetis cells sheds light on their development and transcriptional regulation in mice, which has implications for the tolerance of gut microbiota...

nature.com

102

12,907

Sanju Sinha

Birendra Kumar Saye retweeted

Sanju Sinha @Sanjusinha7

10 Nov 2025

Most current drug discovery efforts is structure-based eg. create small molecules or antibodies that best binds X. However, a drug may not drive its efficacy from its strongest binder. Taking a step away from structure-paradigm, we reason that if a CRISPR knockout of a gene mimics a drug's effects across cancer cell lines, that gene is likely the drug's target. This was done in @EytanRuppin in collaboration with @anideshpandelab and @BenDavidLab Using this principle, we integrated drug and crispr profiles from 1000s of drugs to find their context specific targets (different cancers or when known target is not expressed but drug is yet killing cancer cells). We call this tool DeepTarget. We show that this approach outperforms current structure based methods (AF3, RF, Chai) to find drug's target in a genome-wide search, when we had no information on what the target might be. We benchmarked in eight gold-standard drug-target pairs. It took us months to get this benchmarks (we hope this benchmark helps the field) We present two experimentally validated cases and pls see the paper for this (link at the end). An intriguing observation is that we had many cases where we have many small molecules targeting the same gene (eg. EGFR) and we found that small molecules with higher predicted target specificity show greater clinical advancement. Very happy to hear your feedback. Here's the free access link: nature.com/articles/s41698-0…

195

46,941

Mark

Birendra Kumar Saye retweeted

Mark @Sanbomics

13 Aug 2023

RNAseq counting tools are not perfect. I simulated 240 GTEx samples to test multiple tools. Below I show the difference between actual and estimated counts for each simulated sample. But, what is causing this? And will it affect differential expression? (1/7) #Bioinformatics

222

851

243,096

Amjad Masad

Birendra Kumar Saye retweeted

Amjad Masad

@amasad

6 Dec 2023

This fundamentally changes how humans work with computers

4:03

271

1,567

9,870

2,599,816

George Mack

Birendra Kumar Saye retweeted

George Mack

@george__mack

29 Nov 2023

My favorite Charlie Munger story: In 1953, Munger was 29 years old. Recently divorced. Lost the house. Huge social stigma of divorce back then. His 8-year-old son, Teddy, was diagnosed with cancer. The leukemia was incurable. No medical insurance - Munger paid for all his medical care. Charlie would visit Teddy in the hospital every day -- and then walk the streets crying. Teddy died at the age of 9. Charlie was broke, divorced and just lost his child. 99.9% of people would've turned to alcohol, drugs, or suicide. (And you'd understand why) Munger never did. Fast forward to 52 years old, a failed surgery left him blind in one eye with the potential of going fully blind one day. Charlie was an obsessive learner who read every book he could get his hands on. When confronted with the possibility of going blind and no longer being able to read he said: "It's time for me to learn braille!" The only thing that might be more impressive than his intellect was his actions. RIP. --------- Munger on Self-Pity: "Generally speaking, envy, resentment, revenge, and self-pity are disastrous modes of thought. Self-pity gets pretty close to paranoia… Every time you find your drifting into self-pity, I don’t care what the cause, your child could be dying from cancer, self-pity is not going to improve the situation. It’s a ridiculous way to behave. Life will have terrible blows, horrible blows, unfair blows, it doesn’t matter. Some people recover and others don’t. There I think the attitude of Epictetus is the best. He thought that every mischance in life was an opportunity to behave well. Every mischance in life was an opportunity to learn something and that your duty was not to be immersed in self-pity, but to utilize the terrible blow in a constructive fashion. That is a very good idea."

464

5,603

28,560

4,862,648

Rahul Kolle

Birendra Kumar Saye retweeted

Rahul Kolle @rahulkolle

2 Dec 2022

Issue with visual media (IG, TikTok) - the name of the game is distribution. Everyone tries to maximize distribution/reach, and not as many are trying to maximize quality. Result - timelines clogged with low value signals and creators complaining about the "algorithm".