Predicting protein-protein interactions in the human proteome
Predicting which human proteins shake hands—and how—is a longstanding bottleneck. Proteins rarely act alone; they assemble into complexes that drive immunity, metabolism, signaling, and disease. But testing hundreds of millions of possible pairs experimentally is slow, expensive, and blind to many weak or transient interactions.
Jing Zhang, Qian Cong, David Baker and coauthors tackle this with a smart AI data pipeline. First, they amplify evolutionary “clues” by assembling omicMSAs—deep multiple sequence alignments mined from petabytes of raw eukaryotic genomic data—so coevolution across species pops out. Second, they train a fast interaction model, RoseTTAFold2-PPI, not just on scarce complex structures, but on domain–domain contacts distilled from ~200M AlphaFold monomers—a huge synthetic training set that teaches the network what real interfaces look like.
The payoff is big: a proteome-scale screen over ~200M human pairs yields ~18,000 PPIs at ~90% precision (and ~29k at 80%), including ~3,600 not previously reported. The method excels on transmembrane interactions, a class that’s notoriously hard in the lab, and produces 3D complex models—so you don’t just get a yes/no, you see the interface. Mapping human variants onto these models flags ~4,950 PPIs with disease mutations at the contact surface, offering concrete hypotheses for mechanism.
Beyond pairs, the team reconstructs higher-order assemblies and nominates new components for well-studied complexes (e.g., telomere maintenance, GPI-GnT, cilia/flagella machinery), and highlights GPCR partners and mitochondrial modules that have been hiding in plain sight.
Stepping back: this is a credible path toward a computed 3D human interactome—faster, cheaper, and increasingly comprehensive as more genomes and structures arrive. It doesn’t replace experiments; it prioritizes them, focusing bench time where the biology is richest.
Paper:
science.org/doi/full/10.1126…