MSBINF - @GeorgiaTech | Graduate Research Assistant @EmoryUniversity - @mbhasin_ 's lab

Joined December 2020
1 Photos and videos
Birendra Kumar Saye retweeted
Introducing EpiBench, an agentic benchmark for practical epigenomics analysis. 106 evaluations span CUT&Tag/CUT&RUN, ATAC-seq, ChIP-seq, and DNA methylation workflows. The best agent–harness pair passes 45.0% of evaluations. Evaluations reflect the assay outputs scientists use in practice. A task may depend on alignment files, peak calls, methylation tables, QC metrics, sample metadata, genomic annotations, or downstream summaries. Solving them requires a mix of coding, data analysis, and scientific judgment. Ground truth is hard to define even for short-horizon scientific tasks. Alternative task interpretations can produce multiple plausible answers. Candidate tasks are hardened through manual quality control. We remove prompts that over-specify the method, answers that can be solved with general literature knowledge, and ground truths that fail to reproduce under peer reproduction. Short-horizon tasks are the current frontier for scientific agents in epigenomics. Before models can own deeper biological reasoning, they need to become reliable at local assay-specific decisions.
4
22
105
7,699
Birendra Kumar Saye retweeted
After several months of analyzing model trajectories on SpatialBench, we found issues in a subset of evals. Some tasks depended on analysis decisions not specified in the prompt. Others had grader thresholds that were too narrow, rejecting valid solution paths the original domain expert had not considered. We ran two rounds of independent expert attempts without access to solutions. This produced SpatialBench Verified: a 115-problem gold-standard subset of the original 159 evals where expected answers can be reproduced from only the prompt and associated data. Model ordering is largely preserved, but scores increase 11.6pp on average. Verifiability in biology is hard because correct answers often depend on tacit analysis choices. Our results suggest independent human verification should be a core part of benchmark construction.
1
6
24
2,002
Birendra Kumar Saye retweeted
Biology is the next agentic frontier after coding. Anthropic is aggressively improving their models on routine data analysis with careful attention to nuances of different assay types. Opus 4.8 is noticeably better at single cell / spatial analysis. We have already rolled it out to customers across pharma and academia. Cool to see our benchmarks on the system card.
May 28
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.
8
41
285
21,871
Birendra Kumar Saye retweeted
Introducing SpatialBench-Long, a benchmark for long-horizon spatial biology. Agents must recover biological claims from raw data and realistic experimental context without prescribed methods. 24 evals span primary tumors, organoids, xenograft models, lineage-tracing systems, and aging/intervention biology. The best agents score 11.1%.
1
17
67
8,480
Birendra Kumar Saye retweeted
Figuring out how to benchmark agents on realistic biology research has quickly become one of my favorite types of engineering work. You work with scientists to get to the core of some biological claim, precisely assembling raw data/prior literature/experimental context in a little 'world' for an agent to stumble around in - only for its behavior to challenge what you thought to be true and to force you to deeply introspect empirical human behavior. Doing this well gets considerably more difficult as we better approximate and climb long horizon scientific work: the type one might publish in the results section of a paper or build a drug program around. Been working on a project in this direction for the past few months and excited to drop tomorrow.
2
12
89
4,461
Birendra Kumar Saye retweeted
May 26
The new competition isn’t Humans vs AI. It’s Humans with AI vs everyone else.
986
1,856
14,634
518,646
Birendra Kumar Saye retweeted
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
7,989
11,150
150,232
27,570,661
Birendra Kumar Saye retweeted
If these machines become as ubiquitous as sequencers at the same time the labs start post training seriously in biology we're in for a scientific revolution
12
86
7,122
Birendra Kumar Saye retweeted
Agents calling tools now generate more revenue on Latch than scientists clicking buttons. From native deployments but also harnesses like Claude Code, Codex, Cursor across Pharma, biotech, academic labs. Product roadmap increasingly concerned with exposing compute intensive operations and experimental context to agents while designing new ways for humans to steer. Tool calls can dispatch a TB of data over a dozen computers. Errors in eg. parameter selection, infra failures, tool selection more expensive in time/cost than familiar agentic domains. Very early and exciting product space.
3
5
45
4,091
Birendra Kumar Saye retweeted
Are frontier models improving at single cell biology? We revised scBench with additional rounds of scientist review and generated new benchmark data for Opus 4.7, GPT 5.5, and Gemini 3.1. On accuracy, Gemini 3.1 roughly doubles, Opus 4.7 shows modest improvement, and GPT 5.5 shows little to no improvement. On latency, GPT 5.5 improves substantially, Opus 4.7 shows little change, and Gemini 3.1 worsens by ~5x.
5
11
64
6,439
Birendra Kumar Saye retweeted
Gave a talk to Machine Learning @ Berkeley on benchmarking frontier models on spatial biology. Why understanding how assays work is important, what verifiability might look like with messy biology infrastructure challenges running agentic evals at scale.
6
11
179
14,445
Birendra Kumar Saye retweeted
New frontier models are not meaningfully improving at spatial biology. Overall accuracy for GPT-5.5 and Opus 4.7 remains flat on SpatialBench. Scientist-reviewed trajectories reveal persistent gaps in assay-aware biological judgment.
3
15
61
6,296
Birendra Kumar Saye retweeted
This is the year of agents in biology. What you're seeing in code is already unfolding in molecular data analysis, reorganizing workflows in basic research and drug development. Path forward is focused benchmarking engineering scoped to specific types of assays. Just as coding agents had to reliably write JavaScript before they could build a browser, biology agents must first learn to accurately process and interpret concrete measurements, (eg. spatial assays), before they can reason about disease, drug mechanism, or patient response. Our roadmap reflects this progression: procedural skill in analysis -> emergent biological reasoning -> synthesis across data types, translational context, and realistic ambiguity. Towards systems that can eventually support expensive, high-stakes decisions in drug programs or research projects. Diffusion in biology is slower than software and needs to be thought through carefully. We work directly with the teams building measurement tech (eg. TakaraBio and Vizgen) and package assay-specific agents alongside their kits and instruments. Scientists complete sample preparation, then use these tech-specific agents to move from raw data to answers and figures. Our partners white-label our platform; we do not run a direct biotech sales motion. Now hiring rapidly across major assay categories, prioritized by which we believe will contribute most to the area under the molecular data curve over the next several years - Spatial - Single Cell - Epigenomics - Genomics - Perturbation/Screening - Diagnostics Looking for talented scientists and engineers with strong foundations in theory and deep experience in these areas to help us build scientifically accurate agents.
4
14
153
77,230
Birendra Kumar Saye retweeted
Very happy to share our new study defining the ontogeny of Thetis cells and the developmental cues that shape their early-life wave of differentiation. nature.com/articles/s41586-0… Beautiful work led by exceptional @HHMI Gilliam fellow @YoselinAPI and @MSK postdoc Tyler Park 🧵 1/
10
21
102
12,907
Birendra Kumar Saye retweeted
Most current drug discovery efforts is structure-based eg. create small molecules or antibodies that best binds X. However, a drug may not drive its efficacy from its strongest binder. Taking a step away from structure-paradigm, we reason that if a CRISPR knockout of a gene mimics a drug's effects across cancer cell lines, that gene is likely the drug's target. This was done in @EytanRuppin in collaboration with @anideshpandelab and @BenDavidLab Using this principle, we integrated drug and crispr profiles from 1000s of drugs to find their context specific targets (different cancers or when known target is not expressed but drug is yet killing cancer cells). We call this tool DeepTarget. We show that this approach outperforms current structure based methods (AF3, RF, Chai) to find drug's target in a genome-wide search, when we had no information on what the target might be. We benchmarked in eight gold-standard drug-target pairs. It took us months to get this benchmarks (we hope this benchmark helps the field) We present two experimentally validated cases and pls see the paper for this (link at the end). An intriguing observation is that we had many cases where we have many small molecules targeting the same gene (eg. EGFR) and we found that small molecules with higher predicted target specificity show greater clinical advancement. Very happy to hear your feedback. Here's the free access link: nature.com/articles/s41698-0…
9
44
195
46,941
Birendra Kumar Saye retweeted
13 Aug 2023
RNAseq counting tools are not perfect. I simulated 240 GTEx samples to test multiple tools. Below I show the difference between actual and estimated counts for each simulated sample. But, what is causing this? And will it affect differential expression? (1/7) #Bioinformatics
11
222
851
243,096
Birendra Kumar Saye retweeted
6 Dec 2023
This fundamentally changes how humans work with computers

271
1,567
9,870
2,599,816
Birendra Kumar Saye retweeted
My favorite Charlie Munger story: In 1953, Munger was 29 years old. Recently divorced. Lost the house. Huge social stigma of divorce back then. His 8-year-old son, Teddy, was diagnosed with cancer. The leukemia was incurable. No medical insurance - Munger paid for all his medical care. Charlie would visit Teddy in the hospital every day -- and then walk the streets crying. Teddy died at the age of 9. Charlie was broke, divorced and just lost his child. 99.9% of people would've turned to alcohol, drugs, or suicide. (And you'd understand why) Munger never did. Fast forward to 52 years old, a failed surgery left him blind in one eye with the potential of going fully blind one day. Charlie was an obsessive learner who read every book he could get his hands on. When confronted with the possibility of going blind and no longer being able to read he said: "It's time for me to learn braille!" The only thing that might be more impressive than his intellect was his actions. RIP. --------- Munger on Self-Pity: "Generally speaking, envy, resentment, revenge, and self-pity are disastrous modes of thought. Self-pity gets pretty close to paranoia… Every time you find your drifting into self-pity, I don’t care what the cause, your child could be dying from cancer, self-pity is not going to improve the situation. It’s a ridiculous way to behave. Life will have terrible blows, horrible blows, unfair blows, it doesn’t matter. Some people recover and others don’t. There I think the attitude of Epictetus is the best. He thought that every mischance in life was an opportunity to behave well. Every mischance in life was an opportunity to learn something and that your duty was not to be immersed in self-pity, but to utilize the terrible blow in a constructive fashion. That is a very good idea."
464
5,603
28,560
4,862,648
Birendra Kumar Saye retweeted
Issue with visual media (IG, TikTok) - the name of the game is distribution. Everyone tries to maximize distribution/reach, and not as many are trying to maximize quality. Result - timelines clogged with low value signals and creators complaining about the "algorithm".
1
2