Biology AI Daily

Biology AI Daily

Users
Tweets

Mar 10

Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment 1 Researchers introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to optimize for multiple developability properties simultaneously without sacrificing structural fidelity. 2 The core innovation lies in a semi-online Direct Preference Optimization (DPO) strategy with an adaptive preference margin that automatically resolves conflicts between competing objectives like solubility and thermostability. 3 Unlike existing approaches that rely on post-hoc mutation, inference-time biasing, or retraining on curated subsets, ProtAlign enables target-independent optimization that requires minimal domain expertise and hyperparameter tuning. 4 Applied to ProteinMPNN, the resulting model MoMPNN achieves superior performance on solubility and thermostability benchmarks compared to specialized models like SolubleMPNN and HyperMPNN, while maintaining or improving designability metrics. 5 The framework constructs preference pairs using in silico property predictors and employs a flexible margin mechanism that reduces the required preference gap when winning sequences perform worse on auxiliary properties, preventing over-optimization of single objectives. 6 MoMPNN demonstrates robust generalization across diverse evaluation scenarios including CATH 4.3 crystal structures, de novo backbones generated by RFDiffusion, and real-world binder design tasks, outperforming baselines consistently. 7 The semi-online training paradigm decouples rollout and evaluation from training, enabling efficient batch computation and avoiding the computational overhead of running property predictors during gradient updates. 💻Code: github.com/biogeometry/ProtA… 📜Paper: arxiv.org/abs/2603.06748 #ProteinDesign #InverseFolding #MultiObjectiveOptimization #DirectPreferenceOptimization #ComputationalBiology #ProteinMPNN #MachineLearning #StructuralBiology

2,672

Biology AI Daily

Biology AI Daily @BiologyAIDaily

10 Jun 2025

AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization １．This study introduces AnnoDPO, a novel multimodal framework that improves protein functional annotation by integrating Direct Preference Optimization (DPO), a reinforcement learning variant, into protein language model training. ２．AnnoDPO addresses two major challenges in protein function prediction: the scarcity of annotated data and the highly imbalanced distribution of functional categories, using preference-aligned training objectives inspired by reinforcement learning from human feedback (RLHF). ３．The framework consists of three training stages: pretraining a protein sequence encoder (ESM-C), supervised finetuning combining annotation prediction and sequence-annotation contrastive learning, and finally DPO to optimize preferences directly without explicit reward modeling. ４．DPO enhances model attention patterns, enabling better capture of hierarchical relationships within Gene Ontology (GO) terms, which improves discrimination among biological processes, molecular functions, and cellular components. ５．Experimentally, AnnoDPO consistently outperforms baseline models in multiple Gene Ontology categories, showing significant gains in F1-Max scores across biological process, cellular component, and molecular function annotations. ６．The model demonstrates improved robustness across label frequency groups, particularly excelling at predicting rare (low-frequency) protein function annotations through its preference optimization approach. ７．Visualization of latent embeddings reveals that AnnoDPO achieves clearer functional category separability and preserves fine-grained ontological relationships, supporting biologically meaningful annotation predictions. ８．Ablation studies confirm that both contrastive learning and DPO contribute critically to performance gains, with DPO-powered models achieving state-of-the-art results without relying on complex reward modeling. ９．The authors release the code for AnnoDPO, promoting reproducibility and further development in protein functional annotation research. 💻Code: github.com/AzusaXuan/AnnoDPO 📜Paper: arxiv.org/abs/2506.07035v1 #ProteinFunction #Bioinformatics #MachineLearning #ProteinLanguageModels #ReinforcementLearning #DirectPreferenceOptimization #GeneOntology #ComputationalBiology

887

HackerNoon | Learn Any Technology

HackerNoon | Learn Any Technology

@hackernoon

26 Aug 2024

Examine sample responses and GPT-4 judgments to gain insights into the quality of generated text. - hackernoon.com/performance-o… #aifinetuning #directpreferenceoptimization

Performance of Best of N Baseline for Various N and Sample Responses and GPT-4 Judgments | Hacker...

Examine sample responses and GPT-4 judgments to gain insights into the quality of generated text.

hackernoon.com

636

HackerNoon | Learn Any Technology

HackerNoon | Learn Any Technology

@hackernoon

25 Aug 2024

Learn how the Plackett-Luce model is used to derive the DPO objective. - hackernoon.com/deriving-the-… #aifinetuning #directpreferenceoptimization

Deriving the DPO Objective Under the Plackett-Luce Model | HackerNoon

Learn how the Plackett-Luce model is used to derive the DPO objective.

hackernoon.com

874

HackerNoon | Learn Any Technology

HackerNoon | Learn Any Technology

@hackernoon

25 Aug 2024

Learn how to derive the DPO objective under the bradley-terry model. - hackernoon.com/deriving-the-… #aifinetuning #directpreferenceoptimization

Deriving the DPO Objective Under the Bradley-Terry Model | HackerNoon

Learn how to derive the DPO objective under the bradley-terry model.

hackernoon.com

801

HackerNoon | Learn Any Technology

HackerNoon | Learn Any Technology

@hackernoon

25 Aug 2024

This appendix provides a detailed mathematical derivation of Equation 4, which is central to the KL-constrained reward maximization objective in RLHF. - hackernoon.com/deriving-the-… #aifinetuning #directpreferenceoptimization

Deriving the Optimum of the KL-Constrained Reward Maximization Objective | HackerNoon

This appendix provides a detailed mathematical derivation of Equation 4, which is central to the KL-constrained reward maximization objective in RLHF.

hackernoon.com

771

HackerNoon | Learn Any Technology

HackerNoon | Learn Any Technology

@hackernoon

25 Aug 2024

Learn about the key contributions of each author to the development of DPO. - hackernoon.com/behind-the-sc… #aifinetuning #directpreferenceoptimization

Behind the Scenes: The Team Behind DPO | HackerNoon

Learn about the key contributions of each author to the development of DPO.

hackernoon.com

735

Umar Jamil

Umar Jamil

@hkproj

14 Apr 2024

A complete explanation of Direct Preference Optimization (DPO) and the math derivations needed to understand it. Code explained. Link to the video: youtu.be/hvGa5Mba4c8 #dpo #directpreferenceoptimization #rlhf #rl #llm #alignment #finetuning #ai #deeplearning

Direct Preference Optimization (DPO) explained: Bradley-Terry model,...

In this video I will explain Direct Preference Optimization (DPO), ...

youtube.com

7,496

Dotlas AI

Dotlas AI @DotlasAi

28 Jun 2023

📚 Exciting breakthrough in language models! No RL needed! Train LLMs with a new loss function to improve better completions while reducing worse ones. Check out @YZeldes's post for details! #AI #LanguageModels #DirectPreferenceOptimization bit.ly/3PsDaBA

To get LLMs as good as OpenAI's GPT-4, is RL really needed? I'm not 100% convinced. Don't get me...

To get LLMs as good as OpenAI's GPT-4, is RL really needed? I'm not 100% convinced. Don't get me wrong, the HF part of RLHF (Reinforcement Learning from Human Feedback) is important. But do we really...

linkedin.com