math/neuroscience - AI

Joined December 2014
133 Photos and videos
Pinned Tweet
Incredibly excited to share what I’ve been researching since joining @AnthropicAI We found emotion concepts in Claude and studied their function!
New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
90
44
831
60,370
Nicholas Sofroniew retweeted
I'm so excited to show the world what we've been working on the for the past months!! I'm going to highlight some of the fun results from this paper that I find particularly exciting.
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
15
31
216
72,234
Very excited to have played a part in this while I still at ES I recommend reading the paper, there is some really cool stuff in it including interpretability on ESMC I think the next wave of biological discovery will come through understanding the internals of language models!
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
1
30
1,908
Two things, first an update that I've been working at Anthropic on Interpretability. For me it's a wonderful combination of maths, neuroscience, and AI, and I love it Second I want to express my support for the company and its leadership for acting with integrity and principles
A statement on the comments from Secretary of War Pete Hegseth. anthropic.com/news/statement…
3
9
323
9,253
Nicholas Sofroniew retweeted
Researchers have developed a deep learning protein language model, ESM3, that enables programmable protein design. Learn more in this week's issue of Science: scim.ag/4b5IlQu
32
158
543
239,582
So excited to be part of this work and programming biology! 🤖🧬🧫
16 Jan 2025
We're thrilled to present ESM3 in @ScienceMagazine. ESM3 is a generative language model that reasons over the three fundamental properties of proteins: sequence, structure, and function. Today we're making ESM3 available free to researchers worldwide via the public beta of an API for biological intelligence. Trained with over a trillion teraflops of compute, this is the first time a model of this scale has been trained for biology, pushing the frontier of AI for biological discovery and engineering. ESM3 learns to represent the immense complexity of protein biology, learning from billions of natural proteins. From this training it developed the capability to design proteins, responding to complex prompts combining atomic level details and high level instructions to generate new proteins. ESM3 can explore protein space far beyond natural evolution. We prompted ESM3 to generate a fluorescent protein at a far distance from any known fluorescent proteins, searching an unknown region of protein space, to discover a new fluorescent protein. We estimate this is equivalent to simulating five hundred million years of evolution.
1
32
2,313
Nicholas Sofroniew retweeted
16 Jan 2025
We're thrilled to present ESM3 in @ScienceMagazine. ESM3 is a generative language model that reasons over the three fundamental properties of proteins: sequence, structure, and function. Today we're making ESM3 available free to researchers worldwide via the public beta of an API for biological intelligence. Trained with over a trillion teraflops of compute, this is the first time a model of this scale has been trained for biology, pushing the frontier of AI for biological discovery and engineering. ESM3 learns to represent the immense complexity of protein biology, learning from billions of natural proteins. From this training it developed the capability to design proteins, responding to complex prompts combining atomic level details and high level instructions to generate new proteins. ESM3 can explore protein space far beyond natural evolution. We prompted ESM3 to generate a fluorescent protein at a far distance from any known fluorescent proteins, searching an unknown region of protein space, to discover a new fluorescent protein. We estimate this is equivalent to simulating five hundred million years of evolution.
24
241
852
227,158
Nicholas Sofroniew retweeted
i love this plot
4 Dec 2024
Replying to @alexrives
ESM C models establish a frontier of performance as a function of parameter scale. We see large improvements across all parameter scales over previous state of the art models. Read more: evolutionaryscale.ai/blog/es…
1
1
31
2,647
Super excited about ESMC and the quality of the protein representations! Can't wait to see what people build on top of it
4 Dec 2024
Introducing ESM Cambrian. Unsupervised learning can invert biology at scale to reveal the hidden structure of the natural world. We’ve scaled up compute and data to train a new generation of protein language models. ESM C defines a new state of the art for protein representation learning.
1
19
1,472
Nicholas Sofroniew retweeted
New paper with Bowen :) "Generative Modeling of Molecular Dynamics Trajectories" arxiv.org/abs/2409.17808v1 A "video diffusion" model but for MD trajectories. Different conditioning solves different tasks. E.g. condition on first and last frame => transition path sampling 1/4
11
52
270
27,788
Nicholas Sofroniew retweeted
What does gLM2 learn in non-protein-coding sequences?🧬 Using the Categorical Jacobian and the latest gLM2, we detect incredible regulatory signals in the non-protein coding regions -- all without any supervision!🪄 a quick 🧵
3
22
190
22,145
Nicholas Sofroniew retweeted
Exciting new work from Qian Cong's group on predicting human protein interactome. Leveraging new eukaryotic genomes, new RoseTTAFold2 trained on /- pairs of PPI and large distilled dataset of domain-domain interactions! 🤩 biorxiv.org/content/10.1101/…
8
76
361
27,086
Nicholas Sofroniew retweeted
Have you ever wanted to design protein binders with ease? Today we present 𝑩𝒊𝒏𝒅𝑪𝒓𝒂𝒇𝒕, a user-friendly and open-source pipeline that allows to anyone to create protein binders de novo with high experimental success rates. @befcorreia @sokrypton biorxiv.org/content/10.1101/…
27
423
1,769
300,717
Nicholas Sofroniew retweeted
Great AI meets Bio event in Boston with @mkoeris @geochurch and @alexrives 🧬🧬🧬🔥🔥🔥
20 Sep 2024
Great fireside chat with @geochurch and @alexrives moderated by @johncumbers here at AI🧬BTO East at @MIT! 1) 3D printing with trillions of printers and billions of inks - what is that? That’s 1 mm^3 of cells assembling products!!! That’s the power of biology - thank you @geochurch for the metaphor
1
3
27
3,037
Nicholas Sofroniew retweeted
12 Sep 2024
OpenAI o1 solves a complex logic puzzle.
32
149
1,591
254,293
Nicholas Sofroniew retweeted
We had a lot to announce, but want to highlight we're building a PaperQA2-version of Wikipedia covering the human proteome. The 240 articles that were graded by experts as better than existing Wikipedia are already viewable - we're generating the rest over the next few weeks!
5
29
205
17,077
Nicholas Sofroniew retweeted
Introducing PaperQA2, the first AI agent that conducts entire scientific literature reviews on its own. PaperQA2 is also the first agent to beat PhD and Postdoc-level biology researchers on multiple literature research tasks, as measured both by accuracy on objective benchmarks and assessments by human experts. We are publishing a paper and open-sourcing the code. This is the first example of AI agents exceeding human performance on a major portion of scientific research, and will be a game-changer for the way humans interact with the scientific literature. Paper and code are below, and congratulations in particular to @m_skarlinski, @SamCox822, @jonmlaurent, James Braza, @MichaelaThinks, @mjhammerling, @493Raghava, @andrewwhite01, and others who pulled this off. 1/
78
737
3,311
555,527
Nicholas Sofroniew retweeted
We used PaperQA2 to extract claims from papers and then see if they're contradicted anywhere in literature. This task is time consuming for humans, but we were able to use this for hundreds of papers to look for trends in disagreement in fields, decades, and journals.
10
55
444
47,130
Nicholas Sofroniew retweeted
Published today in @Nature, we describe an approach for single-molecule protein reading on @nanopore arrays. By utilizing ClpX unfoldase to ratchet proteins through a CsgG nanopore, we achieved single-amino-acid sensitivity. nature.com/articles/s41586-0…
19
283
989
87,207
Nicholas Sofroniew retweeted
Just putting some papers on the timeline:) Looking forward to seeing if we are now in the fourth phase (with deep-learning adding even more designs per year). Also nice to put some numbers onto things people have long suspected (like overrepresentation of alpha helices).
We are pleased to announce the release of The Protein Design Archive. Our comprehensive database of designed proteins highlights the triumphs and challenges of the field: pragmaticproteindesign.bio.e… Preprint:biorxiv.org/content/10.1101/… @mjstam @MartaChronowska @WoolfsonLab #proteindesign
8
43
5,772