A Fully-Open Structure-Guided RNA Foundation Model for Robust Structural and Functional Inference
1. A new RNA foundation model, structRFM, has been introduced, pre-trained on millions of RNA sequences and secondary structures. This model incorporates base pairing interactions into masked language modeling through a novel pair matching operation, setting new benchmarks in RNA structure prediction and functional inference.
2. StructRFM achieves top performance in zero-shot homology classification and secondary structure prediction among fifteen biological language models. It also shows a significant 19% performance gain in tertiary structure prediction compared to AlphaFold3 on the RNA Puzzles dataset.
3. The model's ability to capture structural and functional patterns is validated through tasks like internal ribosome entry site identification, where it achieves a remarkable 49% performance gain in F1 score. This highlights the effectiveness of integrating structural information into pre-training.
4. StructRFM is designed with a structure-guided masked language modeling (SgMLM) strategy, balancing nucleotide-level and structure-guided masking dynamically. This approach allows the model to learn joint knowledge of sequential and structural data without task-specific biases.
5. The authors have made the 21-million sequence-structure dataset and the pre-trained structRFM model fully open-source, facilitating the development of multimodal foundation models in computational biology. This is a significant step towards democratizing RNA modeling.
6. StructRFM demonstrates strong generalization across diverse downstream tasks, including splice site prediction, IRES identification, and ncRNA classification. Its versatility and robustness make it a promising tool for advancing RNA-centric research.
7. The study highlights the potential for further development in multimodal RNA language models, suggesting future directions such as contrastive learning and unified frameworks for biomolecular modeling.
📜Paper:
biorxiv.org/content/10.1101/…
💻Code:
github.com/heqin-zhu/structR…
#RNAFoundationModel #StructRFM #RNAPrediction #ComputationalBiology #OpenSource #Bioinformatics