Joined November 2020
655 Photos and videos
1/12🧡Do you want to learn how to design proteins using AI but don’t know anything about bio? I created a free 10-lesson course on YouTube. It’s now available in Spanish (original) and English (autodubbing w/Kokoro 82M). Here’s an overview of the topics covered in each lecture :)
2
10
37
2,524
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
1/2 🧡| 2 MUST read papers if you want to use generative AI with proteins. tldr: Diff create plausible but - diverse proteins, PLMs do the opposite biorxiv.org/content/10.1101/… Among diff models, RFDiffusion & Chroma exhibit the most balanced performance arxiv.org/abs/2504.16479
1
12
90
8,136
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
"You are sheltering a Mythos-level model in your server room, are you not?"
69
993
15,957
458,411
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
Scientists are actively researching ways to make phage therapies more effective. This includes efforts to engineer phages to expand their host range and aid in bacterial defense system evasion. More at Molecule of the Month: pdb101.rcsb.org/motm/318
3
7
94
7,852
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
last one
157
2,635
46,271
940,939
I agree. Just because something can be tokenized does not make it a language. However, proteins and languages do share certain underlying information-processing mechanisms. Check out my lecture on AlphaFold for more details about protein language models :) youtu.be/4K8SDxk85a0?si=omv6…
I want to SCREAM every time I read an LM paper that says how human language is similar to protein or DNA
3
23
2,262
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
Replying to @karpathy
This is not a day for celebrating, Andrej. It's a very dark and very sad day, and the damage may be impossible to undo.
107
242
4,363
377,151
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
Does your designed active site already exist in nature? Is an uncharacterized protein hiding a catalytic site or a pocket? Folddisco can answer both, searching millions of structures for a 3D motif in seconds. @NatureBiotech🧬 πŸ“„ nature.com/articles/s41587-0… 🧡1/7πŸ‘‡
1
11
58
9,428
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
I'm happy to share the first pre-print of out Lab! πŸŽ‰ Introducing ArchaeaHQ biorxiv.org/content/10.64898… We curated 21,644 genomes across all 4 archaeal kingdoms to bridge the gap in public datasets for computational biology What is inside ArchaeaHQ... (1/2)
2
9
24
2,270
TLDR, Because of evolution. Trp came late to the party and yet managed to become integrated into the code. Maintaining aromatic aas is a complex challenge. For more details, you can learn more about aa/protein evolution in this lecture :) youtu.be/rkmWSR8BUms?si=5V0S…
Why does tryptophan have only one codon, UGG?
5
49
421
31,705
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
MD simulations of proteins have been around for almost 50 years (1977). What are you talking about?
For the past 70 years, modern biology has been built on a static objects worldview: DNA = information, proteins = structures, cells = nanomachines, disease = broken parts, drugs = part repair/update. I'm honestly excited for a β€œnew worldview” and β€œDynamic Biology” to emerge.
14
23
246
40,640
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
Try this new online training game from Leandro F. Estrozi designed for newcomers in structural biology, with a focus on cryo-EM and cryo-ET map interpretation: rico.ibs.fr/helixplorer/reso…
5
81
375
51,065
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
Screen 1M random protein sequences to discover that biology-like folds are accessible from random sequences with surprising frequency @KlaraH_lab
3
57
319
29,252
lesssgoooooooo πŸ€ͺ
In January, I reviewed a Review for a crappy MDPI journal. It was 100% AI, missing citations, etc. I sent a massive report, and the authors withdrew it. Today, I see it published without a single change in another even crappier MDPI journal.
3
222
At my university, the largest in Mexico, there is a very frustrating records office. To graduate from my master's they asked me for my middle school docs! They told me they didn't have them, but in the end it was a mistake on their part that cost me 2 months of bureaucracy 🀬
A striking graph of MIT's admin bloat in a @PNASNews paper by @VickyCYang (@MITSloan).
3
252
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
By the way, we're hiring at Biohub. Come hang out with us if you want to work on frontier AI or biology. We have thousands of GPUs, petabytes of data (biology is increasingly an engineering problem!) and billions of cells to image!
11
34
374
21,863
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
Binder design has come of age thanks to generative modelsβ€”but how can we access the wider array of dynamic, multistate protein functions, so elegantly employed by nature? @mihirbafna14 and I are excited to share SwitchCraft, a framework for designing such functions. (1/7)
17
143
619
84,766
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
.@proteinrosh out here recruiting research partners
We've done a million of these deep dives into interpreting and understanding ESMC features, it's just that we don't quite know how to write about them other than to say "here are a bunch of cool observations".
1
24
1,461
I think this is a good summary of the ESMFold2 paper. My bet is that the next version will use the 100B protein sequences from LOGAN DB. However, doing so would require a highly generalizable and scalable architecture.
Can someone give me the tldr on ESMFold vs ESMFold2? They also did an atlas release with ESMFold years ago. And isn't ESM2 like 5 years old? What happened to ESM3, which was open weights and 2 years old?
1
1
33
2,476
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
I'm so excited to show the world what we've been working on the for the past months!! I'm going to highlight some of the fun results from this paper that I find particularly exciting.
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
15
31
216
72,237
GAMA Miguel Angel πŸ¦β€β¬›πŸ”‘ retweeted
Sandpiper 2 is up. 913,000 metagenomic community profiles w GTDB R232 via SingleM, 200k more than 1.0. sandpiper.qut.edu.au GlobDB coming. Thanks to @aroney_samuel @IAmTotesBrett Josh Mitchell and especially the new kid @StefanHerh
26
52
3,710