Filter
Exclude
Time range
-
Near
Researchers @UHN, @arcinstitute & @VectorInst develop #BioReasonPro — a multimodal reasoning LLM that predicts protein function by integrating sequence, structure, domain, and interaction data, closely mirroring how expert biologists reason. Quick Read: cbirt.net/bioreason-pro-allo…
1
71
BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning @arcinstitute 1. BioReason-Pro introduces the first multimodal reasoning large language model specifically designed for protein function prediction, combining protein embeddings with biological context to generate interpretable reasoning traces rather than just classification labels. 2. The system integrates ESM3 protein embeddings, a GO graph encoder, and biological context including organism, domains, protein-protein interactions, and GO-GPT predictions to perform step-by-step biological reasoning from sequence to function. 3. GO-GPT, a key component, is the first autoregressive transformer for Gene Ontology prediction that captures hierarchical and cross-aspect dependencies between GO terms, achieving state-of-the-art Fwmax of 0.65-0.70 across inference strategies. 4. The model was trained on over 130,000 synthetic reasoning traces generated by GPT-5 and further optimized through reinforcement learning with Group Sequence Policy Optimization, achieving 73.6% Fmax on GO term prediction. 5. Human protein experts preferred BioReason-Pro annotations over ground truth UniProt annotations in 79% of evaluated cases, with an LLM judge score of 8/10 for functional summaries, substantially outperforming previous methods. 6. Remarkably, BioReason-Pro de novo predicted experimentally confirmed binding partners with per-residue attention localizing to exact contact residues resolved in cryo-EM structures, demonstrating genuine structural reasoning capabilities. 7. The model successfully performed structural reasoning that overrode misleading superfamily-level domain annotations, such as correctly identifying CFAP61 as a non-enzymatic scaffold despite its Rossmann-like fold that typically indicates catalytic activity. 8. For eEFSec, BioReason-Pro identified SECIS-binding protein 2 as the obligate functional partner from sequence alone, with attention concentrated on the RIFT domain surface that matches the experimentally resolved SECIS RNA binding interface in PDB 7ZJW. 9. The system maintains strong performance even for proteins with very low sequence similarity to training data, with performance degrading much more slowly than BLAST as sequence identity decreases, indicating learned generalizable reasoning rather than simple homology transfer. 10. All model weights, code, and curated datasets are released publicly, alongside precomputed predictions for over 240,000 proteins including the Human Protein Atlas, enabling broad adoption for functional annotation of uncharacterized proteins. 💻Code: bioreason.net/code 📜Paper: biorxiv.org/content/10.64898… #BioReasonPro #ProteinFunction #ComputationalBiology #Bioinformatics #MachineLearning #LLM #GeneOntology #ProteinStructure #FunctionalAnnotation #AIforScience
1
20
84
5,556