Biomedical Data Science PhD student @Stanford

Joined September 2020
3 Photos and videos
Jake Silberg retweeted
🚀 Today, we’re excited to introduce SimpleTES for scaling the scientific discovery loop. 🧵 I always ask myself: what are we actually scaling in scientific discovery? Most LLM discovery methods focus on test-time scaling generation — more tokens, more agents, more turns. But science advances through the evaluation-driven loops: propose → evaluate → refine → repeat. SimleTES captures this idea, discovering SOTA solutions across 21 scientific problems! Key discoveries: 🏎️ 2.17x faster lasso solver than glmnet — the gold-standard LASSO solver, engineered for decades. ⚛️ 24.5% fewer quantum routing overhead on IBM Q20 — superior than previous standard library LightSABRE. 📐 0.380868 on Erdős Minimum Overlap — outperforming previous solutions from mixed-frontier ensembles or humans. 🧬 0.74 on Tabula Muris (scRNA-seq denoising) — new SOTA, generalizing to unseen tissue types without retraining. #LLM #AI4Science #ScalingLaws #SimpleTES #MachineLearning
10
43
150
56,415
Jake Silberg retweeted
In AI for scientific discovery, the bottleneck isn't always generation — it's quite often evaluation. How do you design evaluators close to gold? Prevent reward hacking? And critically, how do you scale the evaluation-driven loop to reach genuinely novel discoveries?
🚀 Today, we’re excited to introduce SimpleTES for scaling the scientific discovery loop. 🧵 I always ask myself: what are we actually scaling in scientific discovery? Most LLM discovery methods focus on test-time scaling generation — more tokens, more agents, more turns. But science advances through the evaluation-driven loops: propose → evaluate → refine → repeat. SimleTES captures this idea, discovering SOTA solutions across 21 scientific problems! Key discoveries: 🏎️ 2.17x faster lasso solver than glmnet — the gold-standard LASSO solver, engineered for decades. ⚛️ 24.5% fewer quantum routing overhead on IBM Q20 — superior than previous standard library LightSABRE. 📐 0.380868 on Erdős Minimum Overlap — outperforming previous solutions from mixed-frontier ensembles or humans. 🧬 0.74 on Tabula Muris (scRNA-seq denoising) — new SOTA, generalizing to unseen tissue types without retraining. #LLM #AI4Science #ScalingLaws #SimpleTES #MachineLearning
1
2
8
1,445
Jake Silberg retweeted
Extremely excited about the results of @adaptyvbio RBX1 binder design competition! 𝑩𝒊𝒏𝒅𝑪𝒓𝒂𝒇𝒕2 performed very well, with 3 out of 7 designs binding to the disordered tail. Overall, only 9 binders worked out of 322 tested, 2.8% hit rate! Proud of the BC2 team ♥️
9
38
219
11,777
Jake Silberg retweeted
SyntheMol-RL has now been published! SyntheMol-RL is a reinforcement learning model for synthesizable small molecule drug design. We used it to design antibiotic candidates for the bacteria S. aureus with hits validated in vitro and in vivo in mice. 1/6 link.springer.com/article/10…
5
15
81
41,400
Jake Silberg retweeted
Finally getting to share one of my favorite projects. ICLR Oral! 🏆 It’s so strange how rigid video tokenization is. Think about it: why should a still landscape cost the same amount of tokens as a busy street? We built InfoTok. We went back to basics with Shannon’s information theory to make tokens "adaptive" in a principled way. Its 2.3x better compression and 11x faster inference demonstrates the magic of the old-school theory ✨ Check it out: research.nvidia.com/labs/dir…
10
43
294
49,353
Jake Silberg retweeted
Excited to share that our paper has been published in Nature Machine Intelligence! We conducted a randomized controlled trial at ICLR 2025 with 20,000 reviews to test whether LLM feedback improves peer review quality. Link: nature.com/articles/s42256-0…

3
24
115
33,777
Jake Silberg retweeted
To make a long story short, we uncover dozens of regions of our genome that control whether the virus persists or is cleared quickly. Further, we show that persistent EBV may serve as a biomarker of complex diseases-- from respiratory disease to autoimmunity.
2
4
32
3,090
Jake Silberg retweeted
🤔Want a principled way to RL your diffusion model? Check Data-regularized Reinforcement Learning (DDRL)! Post-train @nvidia #Cosmos World Foundation models with a million GPU hours! 🤯 Novel formulation ➡️ Theoretically integrates SFT into RL ➡️ Robust to Reward Hacking 🛑 Details: research.nvidia.com/labs/dir… #DDRL #Diffusion #RL #NVIDIA #Cosmos
4
75
270
77,521
Super impressed that, when @ElanaPearl wasn't happy with the loss curve, she realized she needed a PyTorch PR to fix it. A great read.
New blog post: The bug that taught me more about PyTorch than years of using it started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!
3
286
Congrats to @ElanaPearl for her awesome interPLM paper, now in Nature methods. A great way to explore the inner workings of protein language models, with a very well organized and easy-to-use codebase!
Published! 🎉 Paper now has more feature analysis and higher quality figures - thanks to great reviewer feedback! Code also got a major upgrade - v1.0.0 is way more modular so you can easily swap in different protein embeddings or SAE architectures: github.com/ElanaPearl/InterP…
2
122
Jake Silberg retweeted
📢 Excited that #unitox is selected as a #NeurIPS2024 spotlight!💡 We created #LLM agent to analyze >100K pages of FDA docs from all approved drug ➡️ new database annotating 8 toxicity types for 2400 drugs. Validated by clinicians. openreview.net/pdf?id=Vb1vVr… Data zou-group.github.io/UniTox-w… Great job led by @JakeSilberg @KyleWSwanson @ElanaPearl! Thanks to Angela Zhang and @xaniarg for clinical expertise wonderful @genmab collaborators 👏
2
18
78
11,523
Jake Silberg retweeted
13 Oct 2024
Congratulations to our best submission award winners!! 🏆 “Can Large Language Models Explain Their Internal Mechanisms?” by @nadamused_, @ghandeharioun, @RyanMullins, @emilyrreif, Jimbo Wilson, @Nithum, and @iislucas 🏆 “The Illustrated AlphaFold” @ElanaPearl and @JakeSilberg
8
20
4,834
Jake Silberg retweeted
13 Oct 2024
First up, watch @ElanaPearl and @JakeSilberg present “The Illustrated AlphaFold” 🧬elanapearl.github.io/blog/20…
The Illustrated AlphaFold bit.ly/the-illustrated-af3 Do you want to know how AlphaFold3 works? It has one of the most intimidating transformer-based architectures, so to make it approachable, we made a visual walkthrough inspired by @JayAlammar's Illustrated Transformer! 🧵 (1/7)
3
6
2,292
Jake Silberg retweeted
Share your best resources to learn about AlphaFold in the comments! This is one of the best blog posts to learn about AlphaFold that I've seen (by @ElanaPearl & @JakeSilberg): elanapearl.github.io/blog/20…
2
1
3
266
Jake Silberg retweeted
The Illustrated AlphaFold bit.ly/the-illustrated-af3 Do you want to know how AlphaFold3 works? It has one of the most intimidating transformer-based architectures, so to make it approachable, we made a visual walkthrough inspired by @JayAlammar's Illustrated Transformer! 🧵 (1/7)
10
154
665
85,809
Jake Silberg retweeted
Solidarity with Ukraine ✊ (Russian Embassy, London)
5,547
20,067
96,519
13,787,487
Jake Silberg retweeted
Next time you dip your asparagus in salsa, remember the hands that harvested those tomatoes. ❤️ #WeFeedYou

21
411
2,419
Jake Silberg retweeted
This is a day I’ve dreamed of my whole life, this is the reason @DeepMind was founded, to build AI and use it accomplish extraordinary scientific breakthroughs like #AlphaFold 2, to advance science and benefit humanity. I could not be more proud of the incredible team!
Today with @emblebi, we're launching the #AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome, doubling humanity’s accumulated knowledge of high-accuracy human protein structures - for free: dpmd.ai/alphafolddb 1/
138
1,071
5,912