PepMLM is out in
@NatureBiotech! ๐ญWant a short binder for any target? Just input its sequence desired binder length โ PepMLM unmasks it. Simple? No way? The results speak for themselves. ๐
๐:
nature.com/articles/s41587-0โฆ
๐ป:
github.com/programmablebio/pโฆ
๐ค:
huggingface.co/ChatterjeeLabโฆ
A few years ago, I told my undergrad
@SophieVincoff (now my PhD student) to try the obvious: given the target sequence, use autoregressive next-token prediction to generate the cognate peptide binder. After tons of work, we still couldn't get it to work too well. ๐Then, my Masters student (now Harvard PhD student)
@LeoTZ03 said: what if we just attach the peptide sequence to the end of its target sequence and fully fine-tune an encoder (in this case, ESM-2-650M) to unmask it? Basically, a BERT-like take on conditional (short) sequence generation.
I said to
@LeoTZ03: sounds kind of silly.๐
Well, he did what any good student would do: he ignored me, trained the model on ~10k putative peptide-protein sequence pairs, and the in silico results were pretty amazing! PepMLM consistently outperformed RFDiffusion on held-out/structured targets, with a higher hit rate (38% to 29%) and low perplexities that closely matched real binders, with generated sequences showing target specificity even in stringent permutation tests! ๐ฆพ
But the proof was in the experiments, and we did a LOT of them! ๐งช In our lab, Zach showed that PepMLM peptides achieved nM binding affinity on disease-related receptor targets, like NCAM1 and AMHR2 (RFDiffusion peptides failed to bind).
With the Truant Lab
@McMasterU, we showed PepMLM peptides, when fused to E3 ubiquitin ligases (our uAb architecture), not only degraded MSH3 but completely eliminated Huntington protein in HD cells! ๐งซ We showed similar results by identifying a PepMLM uAb that degraded MESH1, a protein controlling ferroptosis, in collaboration with
@TsanJen's group
@DukeU. Again, no hits for RFDiffusion there.
And with
@AguilarVirology (led by
@MadeleineDumas2), in collaboration with
@DeLisaGroup @Cornell, PepMLM-derived peptides (in uAbs) bound and reduced levels of viral phosphoproteins from Nipah, Hendra, and HMPV, and in live HMPV infection models, they almost completely cleared the P protein! ๐ฆ
I guess sometimes the "silly" ideas win. ๐
Even though my lab is theoretically-inclined (discrete diffusion/flow matching/Schrodinger bridges) to now design more specialized/optimized/specific molecules
@Penn, PepMLM is still a go-to model for my experimentalists. Remember
@lauren_hong11's duAbs in
@NatureComms that stabilized THE tumor suppressor p53? Yep, PepMLM. ๐โโ๏ธ
In fact, PepMLM has averaged ~600 downloads a month on
@huggingface over the last year! It's so easy to use (takes seconds to run on Colab, with just a few lines of transformers code), you don't need to input (or go through) structure, you can enter almost any target sequence (disordered/stable, short/long), and you get a binder. And again, don't believe me, look at the diversity of proteins we've gone after! If you're an experimentalist, it's super worth it to try PepMLM. ๐ The code/model is now fully open-sourced for both academia and industry! ๐
Btw, this was an incredibly collaborative undertaking: 5 labs, across multiple amazing universities, led by co-first authors
@LeoTZ03 (who led all model design, training, and validation), and Zach (
@DukeU), Madeleine (
@Cornell), Christina (
@McMasterU ) who led each of the experimental efforts!! I'm so proud and grateful for everybody's belief that yes, sequence can be all you need for binder design! ๐