From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models
1.This study introduces Likelihood-Fitness Bridging (LFB), a method that improves variant effect prediction in protein and genome language models (pLMs and gLMs) by averaging over phylogenetically related sequences to reduce noise and bias in likelihood-based fitness estimates.
2.LFB does not require retraining or architecture modifications—it operates post hoc on any pretrained generative model, including ESM-2, ProGen2, and Evo 2 families, making it highly practical and scalable.
3.The central insight is that sequence likelihood reflects not only fitness, but also phylogenetic structure and dataset biases. These confounding factors become more pronounced in larger models, which may explain performance plateaus in variant prediction despite improved sequence modeling.
4.Using an Ornstein–Uhlenbeck model of evolution, the authors formalize how averaging log-likelihood changes across homologous sequences suppresses noise introduced by genetic drift, yielding a lower variance estimator of true fitness.
5.When applied to variant classification using clinical labels, LFB consistently improves performance across model families. For ESM-2, LFB elevates AUCs from 0.653 (8M) to 0.889, and from 0.895 (15B) to 0.938, reversing the performance saturation observed at larger model sizes.
6.On the ProteinGym deep mutational scanning (DMS) benchmark, LFB boosts average Spearman correlations across all mutation types. Notably, the 8M ESM-2 model with LFB outperforms the original 35M model, and the 15B model becomes the most accurate overall when LFB is applied.
7.LFB is especially beneficial for larger models that are more susceptible to capturing phylogenetic signals. This confirms that model scale alone doesn't ensure predictive power unless confounding signals are addressed.
8.The method shows broad improvements across mutation types, assays (binding, expression, fitness), and sequence identity thresholds. Surprisingly, strong performance can be retained using as few as 10 homologs per variant.
9.LFB also enhances the separation between pathogenic and benign clinical variants. It reduces the false classification of benign variants as pathogenic, improving the reliability of predictions in a clinical genomics context.
10.While LFB was validated mainly on coding regions and substitution variants, its general framework opens doors for more advanced inference strategies, and points toward principled ways to disentangle fitness from dataset artifacts.
💻Code:
github.com/DiasFrazerGroup/l…
📜Paper:
biorxiv.org/content/10.1101/…
#VariantEffectPrediction #ProteinLanguageModels #GenomeLanguageModels #LFB #Bioinformatics #DeepLearning #ESM2 #ProGen2 #Evo2 #Genomics #FitnessEstimation #PrecisionMedicine