Filter
Exclude
Time range
-
Near
MotifAE Reveals Functional Motifs from Protein Language Model: Unsupervised Discovery and Interpretability Analysis 1. MotifAE is an innovative unsupervised framework designed to discover functional motifs from protein language models, specifically leveraging the ESM2 model. This approach captures evolutionary-scale sequence regularities, enabling the identification of motifs that mediate critical biological processes like folding, binding, and catalysis. 2. The core of MotifAE is a sparse autoencoder (SAE) architecture that projects ESM2 embeddings into a sparse latent space. By introducing a local similarity loss, MotifAE encourages coherent latent feature activations, reflecting the sequential nature of protein motifs and improving motif discovery compared to standard SAEs. 3. When benchmarked against known ELM motifs, MotifAE achieves a median AUROC of 0.88, significantly outperforming standard SAEs (median AUROC of 0.80). This demonstrates its superior ability to capture functional motifs across diverse benchmarks. 4. MotifAE not only identifies motifs but also aligns with experimental data through gated feature selection, identifying features associated with specific properties such as folding stability. This alignment enhances performance in fitness prediction and enables the design of proteins with enhanced stability. 5. The study further demonstrates that MotifAE captures known functional motifs from the ELM database, with some features showing high specificity for certain motifs while others represent more general patterns. This versatility makes MotifAE a powerful tool for large-scale motif discovery. 6. MotifAE’s ability to capture homodimerization interfaces and align with three-dimensional functional sites highlights its potential for uncovering structural motifs. This capability is crucial for understanding protein-protein interactions and complex formation. 7. The authors developed MotifAE-G, a framework that integrates MotifAE with experimental data to identify features associated with specific functions. This approach significantly improves prediction performance on protein stability and provides a method for rational protein design. 📜Paper: biorxiv.org/content/10.1101/… 💻Code: github.com/CHAOHOU-97/MotifA… #MotifAE #ProteinMotifs #SparseAutoencoder #ProteinLanguageModel #ESM2 #UnsupervisedLearning #ProteinEngineering #Bioinformatics
2
4
21
2,436
GOLF: A Generative AI Framework for Pathogenicity Prediction of Myocilin OLF Variants 1.GOLF is a generative AI framework designed to predict and interpret the pathogenicity of missense mutations in the olfactomedin (OLF) domain of myocilin—a key gene linked to open-angle glaucoma (OAG), a major cause of irreversible blindness. 2.GOLF combines evolutionary modeling and mechanistic interpretability, achieving 96.9% accuracy on known variants, outperforming AlphaMissense and fine-tuned ESM-1b in classifying OLF mutations. 3.The method leverages a curated dataset of over 4,000 OLF homologs from 73 taxonomic groups, including non-visual organisms like nematodes, highlighting the deep evolutionary conservation of this domain. 4.Two generative models are used: a variational autoencoder (EVE) and a fine-tuned ESM-1b transformer. EVE showed the best performance, especially in classifying all pathogenic mutations correctly. 5.To interpret model decisions, GOLF incorporates a sparse autoencoder (SAE) that extracts interpretable biochemical features. It reveals that hydrophobic residues often associate with benign predictions, while polar/aromatic residues signal pathogenicity. 6.EVE provides not only a pathogenicity score but also uncertainty estimates per residue, highlighting regions of structural fragility and mutational sensitivity across the OLF domain. 7.A structural map of mutational effects across all 4,959 single-residue substitutions reveals hot spots—especially residues 266–290, 324–334, and 363–394—as regions highly sensitive to variation. 8.The framework reveals that generative models can learn underlying biochemical rules—like polarity and hydrophobic packing—without explicit supervision, suggesting utility in mechanistic variant interpretation. 9.An ensemble of EVE models further improved predictive robustness, reducing initialization bias and enhancing classification consistency across the variant landscape. 10.Limitations include the relatively small number of labeled clinical variants and the current inability to distinguish gain-of-function from loss-of-function effects—an area for future improvement. 11.The authors propose that SAE-derived features can guide future experiments by identifying structurally or biochemically relevant regions, bridging predictive modeling and mechanistic biology. 💻Code: github.com/amirgroup-codes/G… 📜Paper: biorxiv.org/content/10.1101/… #Genomics #ProteinAI #VariantInterpretation #Myocilin #Glaucoma #PathogenicityPrediction #MachineLearning #SparseAutoencoder #EvolutionaryBiology #StructuralBioinformatics
2
524
GOLF: A Generative AI Framework for Pathogenicity Prediction of Myocilin OLF Variants 1.GOLF is a generative AI framework designed to predict and interpret the pathogenicity of missense mutations in the olfactomedin (OLF) domain of myocilin—a key gene linked to open-angle glaucoma (OAG), a major cause of irreversible blindness. 2.GOLF combines evolutionary modeling and mechanistic interpretability, achieving 96.9% accuracy on known variants, outperforming AlphaMissense and fine-tuned ESM-1b in classifying OLF mutations. 3.The method leverages a curated dataset of over 4,000 OLF homologs from 73 taxonomic groups, including non-visual organisms like nematodes, highlighting the deep evolutionary conservation of this domain. 4.Two generative models are used: a variational autoencoder (EVE) and a fine-tuned ESM-1b transformer. EVE showed the best performance, especially in classifying all pathogenic mutations correctly. 5.To interpret model decisions, GOLF incorporates a sparse autoencoder (SAE) that extracts interpretable biochemical features. It reveals that hydrophobic residues often associate with benign predictions, while polar/aromatic residues signal pathogenicity. 6.EVE provides not only a pathogenicity score but also uncertainty estimates per residue, highlighting regions of structural fragility and mutational sensitivity across the OLF domain. 7.A structural map of mutational effects across all 4,959 single-residue substitutions reveals hot spots—especially residues 266–290, 324–334, and 363–394—as regions highly sensitive to variation. 8.The framework reveals that generative models can learn underlying biochemical rules—like polarity and hydrophobic packing—without explicit supervision, suggesting utility in mechanistic variant interpretation. 9.An ensemble of EVE models further improved predictive robustness, reducing initialization bias and enhancing classification consistency across the variant landscape. 10.Limitations include the relatively small number of labeled clinical variants and the current inability to distinguish gain-of-function from loss-of-function effects—an area for future improvement. 11.The authors propose that SAE-derived features can guide future experiments by identifying structurally or biochemically relevant regions, bridging predictive modeling and mechanistic biology. 💻Code: github.com/amirgroup-codes/G… 📜Paper: biorxiv.org/content/10.1101/… #Genomics #ProteinAI #VariantInterpretation #Myocilin #Glaucoma #PathogenicityPrediction #MachineLearning #SparseAutoencoder #EvolutionaryBiology #StructuralBioinformatics
5
587
2023年01月25日ML集会に行った、 @cehl_teapot (おちゃ)さんによる 「SparseAutoEncoderで可視化する特徴量抽出」 のLT動画を公開しました。 再度見たい方や、見逃した方はこちらよりご覧ください。 youtu.be/QWuzrSXQPzA
2
3
502
マシンラーニング集会、終了しました。 SparseAutoEncoder、なんか重要でない情報は除く手法だったみたい。 雑談では相変わらずの主催のげそんさんが、VRChatでの思い出をニューラルネットワークに圧縮して保存するというMADな事をお考えになっていた。
1
2
64
今日の @cehl_teapot さんのSparseAutoEncoderのLTはめちゃわりやすかったし実際の実験データをもとに画像の特徴を分解した特徴量が得られていてとても面白かった! #VRC_ML集会
7
353
【本日22時より開催!】 今日は @cehl_teapot さんより、 「SparseAutoEncoderによる特徴量抽出の可視化」というタイトルでLT会があります! みなさんぜひお気軽にお越しください! LT会は22:30スタートです。本日のJoin 先は げそん<GesonAnko> です。 #VRC_ML集会
毎週水曜22時より、ML集会を開催しております。 ・ML集会とは? ML(機械学習)関連でだべりつつ、ブログやYouTubeを一緒に見て情報共有する会です。 皆さんのご参加お待ちしております。 VRCグループ vrchat.com/home/group/grp_05… Discordサーバ discord.gg/6rQ2PZTDqa #VRC_ML集会
1
5
1,039
今週25日も22時より開催します。 今回はLT会です。ocha-krgさんより 「SparseAutoEncoderによる特徴量抽出の可視化」 というタイトルにて開催します!
毎週水曜22時より、ML集会を開催しております。 ・ML集会とは? ML(機械学習)関連でだべりつつ、ブログやYouTubeを一緒に見て情報共有する会です。 皆さんのご参加お待ちしております。 VRCグループ vrchat.com/home/group/grp_05… Discordサーバ discord.gg/6rQ2PZTDqa #VRC_ML集会
3
2
8
2,290
一旦SparseAutoencoderで訓練画像の疎な特徴を学習してから、プレトレーニング済みの下位層とコーディング層をVAEにコピーしてもっかい学習させる…….行けるか….?
2
chainerでSparseAutoEncoder実装してみたの、書こうかな
1
Theano使ったSparseAutoencoderのテストしてて、なぜかcost関数がNaN返してくるのと数時間ほど戦ってようやく原因がわかった。スパース項求めるときに中間層の出力値の平均のlogを計算するんだけど、ここが負になる(こともある)からや。