GOLF: A Generative AI Framework for Pathogenicity Prediction of Myocilin OLF Variants
1.GOLF is a generative AI framework designed to predict and interpret the pathogenicity of missense mutations in the olfactomedin (OLF) domain of myocilin—a key gene linked to open-angle glaucoma (OAG), a major cause of irreversible blindness.
2.GOLF combines evolutionary modeling and mechanistic interpretability, achieving 96.9% accuracy on known variants, outperforming AlphaMissense and fine-tuned ESM-1b in classifying OLF mutations.
3.The method leverages a curated dataset of over 4,000 OLF homologs from 73 taxonomic groups, including non-visual organisms like nematodes, highlighting the deep evolutionary conservation of this domain.
4.Two generative models are used: a variational autoencoder (EVE) and a fine-tuned ESM-1b transformer. EVE showed the best performance, especially in classifying all pathogenic mutations correctly.
5.To interpret model decisions, GOLF incorporates a sparse autoencoder (SAE) that extracts interpretable biochemical features. It reveals that hydrophobic residues often associate with benign predictions, while polar/aromatic residues signal pathogenicity.
6.EVE provides not only a pathogenicity score but also uncertainty estimates per residue, highlighting regions of structural fragility and mutational sensitivity across the OLF domain.
7.A structural map of mutational effects across all 4,959 single-residue substitutions reveals hot spots—especially residues 266–290, 324–334, and 363–394—as regions highly sensitive to variation.
8.The framework reveals that generative models can learn underlying biochemical rules—like polarity and hydrophobic packing—without explicit supervision, suggesting utility in mechanistic variant interpretation.
9.An ensemble of EVE models further improved predictive robustness, reducing initialization bias and enhancing classification consistency across the variant landscape.
10.Limitations include the relatively small number of labeled clinical variants and the current inability to distinguish gain-of-function from loss-of-function effects—an area for future improvement.
11.The authors propose that SAE-derived features can guide future experiments by identifying structurally or biochemically relevant regions, bridging predictive modeling and mechanistic biology.
💻Code:
github.com/amirgroup-codes/G…
📜Paper:
biorxiv.org/content/10.1101/…
#Genomics #ProteinAI #VariantInterpretation #Myocilin #Glaucoma #PathogenicityPrediction #MachineLearning #SparseAutoencoder #EvolutionaryBiology #StructuralBioinformatics