AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks
1. AbBFN2 is a generative foundation model for antibodies built on the Bayesian Flow Network (BFN) paradigm, allowing conditional generation across 45 sequence, genetic, and biophysical data modes without task-specific training.
2. Unlike typical models trained for one specific task, AbBFN2 consolidates multiple design tasks—such as sequence inpainting, humanisation, biophysical property optimisation, and de novo library generation—into a unified, flexible framework.
3. The model is trained on over 2M paired human, mouse, and rat antibody sequences from the OAS database, annotated with species, germline gene identity, CDR lengths, and TAP-derived developability metrics.
4. AbBFN2 learns the joint distribution of these diverse data modes, enabling concurrent generation of antibody sequences and associated labels that faithfully reflect natural antibodies across sequence, structural, and genetic properties.
5. Generated sequences from AbBFN2 match held-out data in CDR loop lengths, amino acid frequencies, and structural conformations, as validated by dynamic time warping and t-SNE embedding of loop structures.
6. Germline gene usage patterns learned by the model match natural V(D)J recombination biases, indicating its understanding of antibody genetics even though it operates on amino acid sequences.
7. AbBFN2 achieves state-of-the-art performance in sequence annotation, outperforming tools like ANARCI and IgBLASTp in predicting germline gene identities and biophysical properties from sequences.
8. In sequence inpainting tasks, AbBFN2 accurately recovers framework and CDR residues, with Rosetta-predicted VH-VL interface stabilities indistinguishable from real antibodies, suggesting functional plausibility.
9. For sequence humanisation, AbBFN2 performs iterative masked sampling guided by species logits, achieving >95% human confidence across 25 precursor antibodies while preserving structural similarity (mean CDR RMSD \~0.7Å).
10. The model's species classification logits correlate with clinical anti-drug antibody (ADA) response rates (R = –0.52), matching results from p-IgGen and supporting its use in immunogenicity risk assessment.
11. In a multi-objective optimisation task, AbBFN2 successfully humanised 91 non-human antibodies while removing TAP liabilities, generating diverse candidates with developable biophysical profiles in a single-step workflow.
12. AbBFN2 can generate rare antibody types conditionally—e.g., 1715 VRC01-like antibodies with rare germline, CDR loop length, and developability constraints—demonstrating its ability to explore highly specific regions of sequence space.
13. By merging multiple antibody design objectives into a single generative pass, AbBFN2 minimizes the inefficiencies of sequential pipelines and provides a practical, tunable platform for next-gen therapeutic antibody design.
💻Code:
github.com/instadeepai/AbBFN…
📜Paper:
biorxiv.org/content/10.1101/…
#AntibodyDesign #ProteinEngineering #MachineLearning #FoundationModels #Bioinformatics #Therapeutics #AntibodyEngineering #BayesianFlowNetwork #GenerativeAI #Biotech