An Interactive AI Agent for Adaptive Modeling of RNA Region-Ligand Interactions via LLM-Generated Machine Learning Workflows
1. A novel new framework called RLAgent has been introduced, offering a novel approach to modeling RNA-ligand interactions at the region level rather than focusing solely on binding sites. This method provides higher resolution and more flexible modeling, which is crucial for understanding complex RNA segments that may contain multiple or diffuse binding motifs.
2. RLAgent reframes the RNA-ligand prediction workflow as a dialogue-driven process, allowing users to interactively configure modeling preferences through a natural language interface without writing code. This lowers technical barriers and enhances reproducibility, making RNA-ligand prediction more accessible for both computational and experimental biologists.
3. The framework integrates large language model (LLM) capabilities with optional retrieval-augmented generation (RAG), automated debugging mechanisms, and optional RNA foundation model-based feature encoding. It forms a coherent, end-to-end pipeline that supports various model types, from classical machine learning models like Random Forest and XGBoost to deep learning architectures such as LSTM and Transformer.
4. RLAgent’s robustness was tested with different LLM backends, revealing that larger-scale LLMs like GPT-4o and DeepSeek-R1:70B are significantly more reliable for domain-specific code generation. This finding provides practical guidance for selecting LLM backends in future agent-based systems.
5. An ablation study highlighted the importance of query validation and debugging in RLAgent’s workflow. While single-pass code generation achieved high conformity, it came at the cost of significantly longer runtimes. The standard interactive workflow balanced accuracy and efficiency effectively.
6. In terms of performance, deep architectures like the Transformer and Mamba achieved the highest AUC and F1-scores. However, classical learners such as XGBoost and Random Forest also performed competitively, demonstrating RLAgent’s architecture-agnostic capabilities.
7. A case study demonstrated RLAgent’s interactive modeling capabilities using the Mamba model. The entire process, from feature encoding to model evaluation, was driven by natural language commands, showcasing the framework’s user-friendly experience and potential for real-world biomedical research.
8. Despite its contributions, RLAgent has limitations, including a relatively small training dataset and potential overfitting risks. Future versions may support multi-resolution prediction and could serve as a general-purpose interface for human-AI collaborative hypothesis generation in RNA-targeted therapeutics.
📜Paper:
biorxiv.org/content/10.1101/…
#RNA #LigandInteractions #AI #MachineLearning #ComputationalBiology #InteractiveModeling #LLM #DeepLearning