A new resource for protein design: the Protein Design Archive
Protein designers have long aimed to explore the vast possibilities beyond nature’s existing structures, seeking to learn how amino acid sequences fold and function in ways that evolution never tested. Chronowska et al. unveil a new database and website known as the Protein Design Archive (PDA) that aggregates and curates experimentally validated protein designs from the past four decades. This resource highlights the astonishing progression from small, manually tweaked constructs to complex, computationally generated folds, providing a window into how de novo design can help address scientific and societal challenges.
The authors created the PDA by systematically scanning the Protein Data Bank (PDB) for synthetic sequences (taxonomy identifier 32630) and curating results to remove entries that were merely natural proteins with minor mutations. The database holds over 1,500 structures, each annotated with sequence-based and structure-based similarity metrics to provide insight into how these designs deviate from both one another and from known natural proteins. A user-friendly interface enables filtering by date of release, search terms, and novelty scores, while monthly updates ensure its data remain current. Throughout, the authors have balanced inclusivity (capturing a broad range of designs, including historic but unpublished examples) with rigorous hand-checking to maintain quality.
The researchers discovered that protein design efforts have accelerated dramatically, with a discernible shift around 2009–2010 when accessible computational tools like Rosetta further expanded de novo design’s reach. More recently, the advent of deep learning approaches has nearly tripled the annual output of new structures. The PDA not only showcases how designed proteins increasingly mirror the mass, complexity, and packing of their natural counterparts but also helps reveal persistent biases in secondary structure usage. By hosting a comprehensive, regularly updated collection, the archive empowers protein engineers to compare methods, pinpoint gaps, and pursue ambitious new directions that deepen our mastery of protein structure and function.
Paper:
nature.com/articles/s41587-0…
Website:
pragmaticproteindesign.bio.e…