浅間香織 Kaori Asama

浅間香織 Kaori Asama

Users
Tweets

浅間香織 Kaori Asama

@KaoriAsama

18h

x.com/i/article/206602343982…

1,070

Biology AI Daily

Biology AI Daily @BiologyAIDaily

17 Nov 2025

MotifAE Reveals Functional Motifs from Protein Language Model: Unsupervised Discovery and Interpretability Analysis 1. MotifAE is an innovative unsupervised framework designed to discover functional motifs from protein language models, specifically leveraging the ESM2 model. This approach captures evolutionary-scale sequence regularities, enabling the identification of motifs that mediate critical biological processes like folding, binding, and catalysis. 2. The core of MotifAE is a sparse autoencoder (SAE) architecture that projects ESM2 embeddings into a sparse latent space. By introducing a local similarity loss, MotifAE encourages coherent latent feature activations, reflecting the sequential nature of protein motifs and improving motif discovery compared to standard SAEs. 3. When benchmarked against known ELM motifs, MotifAE achieves a median AUROC of 0.88, significantly outperforming standard SAEs (median AUROC of 0.80). This demonstrates its superior ability to capture functional motifs across diverse benchmarks. 4. MotifAE not only identifies motifs but also aligns with experimental data through gated feature selection, identifying features associated with specific properties such as folding stability. This alignment enhances performance in fitness prediction and enables the design of proteins with enhanced stability. 5. The study further demonstrates that MotifAE captures known functional motifs from the ELM database, with some features showing high specificity for certain motifs while others represent more general patterns. This versatility makes MotifAE a powerful tool for large-scale motif discovery. 6. MotifAE’s ability to capture homodimerization interfaces and align with three-dimensional functional sites highlights its potential for uncovering structural motifs. This capability is crucial for understanding protein-protein interactions and complex formation. 7. The authors developed MotifAE-G, a framework that integrates MotifAE with experimental data to identify features associated with specific functions. This approach significantly improves prediction performance on protein stability and provides a method for rational protein design. 📜Paper: biorxiv.org/content/10.1101/… 💻Code: github.com/CHAOHOU-97/MotifA… #MotifAE #ProteinMotifs #SparseAutoencoder #ProteinLanguageModel #ESM2 #UnsupervisedLearning #ProteinEngineering #Bioinformatics

2,436

Biology AI Daily

Biology AI Daily @BiologyAIDaily

25 Jun 2025

GOLF: A Generative AI Framework for Pathogenicity Prediction of Myocilin OLF Variants １．GOLF is a generative AI framework designed to predict and interpret the pathogenicity of missense mutations in the olfactomedin (OLF) domain of myocilin—a key gene linked to open-angle glaucoma (OAG), a major cause of irreversible blindness. ２．GOLF combines evolutionary modeling and mechanistic interpretability, achieving 96.9% accuracy on known variants, outperforming AlphaMissense and fine-tuned ESM-1b in classifying OLF mutations. ３．The method leverages a curated dataset of over 4,000 OLF homologs from 73 taxonomic groups, including non-visual organisms like nematodes, highlighting the deep evolutionary conservation of this domain. ４．Two generative models are used: a variational autoencoder (EVE) and a fine-tuned ESM-1b transformer. EVE showed the best performance, especially in classifying all pathogenic mutations correctly. ５．To interpret model decisions, GOLF incorporates a sparse autoencoder (SAE) that extracts interpretable biochemical features. It reveals that hydrophobic residues often associate with benign predictions, while polar/aromatic residues signal pathogenicity. ６．EVE provides not only a pathogenicity score but also uncertainty estimates per residue, highlighting regions of structural fragility and mutational sensitivity across the OLF domain. ７．A structural map of mutational effects across all 4,959 single-residue substitutions reveals hot spots—especially residues 266–290, 324–334, and 363–394—as regions highly sensitive to variation. ８．The framework reveals that generative models can learn underlying biochemical rules—like polarity and hydrophobic packing—without explicit supervision, suggesting utility in mechanistic variant interpretation. ９．An ensemble of EVE models further improved predictive robustness, reducing initialization bias and enhancing classification consistency across the variant landscape. １０．Limitations include the relatively small number of labeled clinical variants and the current inability to distinguish gain-of-function from loss-of-function effects—an area for future improvement. １１．The authors propose that SAE-derived features can guide future experiments by identifying structurally or biochemically relevant regions, bridging predictive modeling and mechanistic biology. 💻Code: github.com/amirgroup-codes/G… 📜Paper: biorxiv.org/content/10.1101/… #Genomics #ProteinAI #VariantInterpretation #Myocilin #Glaucoma #PathogenicityPrediction #MachineLearning #SparseAutoencoder #EvolutionaryBiology #StructuralBioinformatics

524

Biology AI Daily

Biology AI Daily @BiologyAIDaily

25 Jun 2025

GOLF: A Generative AI Framework for Pathogenicity Prediction of Myocilin OLF Variants １．GOLF is a generative AI framework designed to predict and interpret the pathogenicity of missense mutations in the olfactomedin (OLF) domain of myocilin—a key gene linked to open-angle glaucoma (OAG), a major cause of irreversible blindness. ２．GOLF combines evolutionary modeling and mechanistic interpretability, achieving 96.9% accuracy on known variants, outperforming AlphaMissense and fine-tuned ESM-1b in classifying OLF mutations. ３．The method leverages a curated dataset of over 4,000 OLF homologs from 73 taxonomic groups, including non-visual organisms like nematodes, highlighting the deep evolutionary conservation of this domain. ４．Two generative models are used: a variational autoencoder (EVE) and a fine-tuned ESM-1b transformer. EVE showed the best performance, especially in classifying all pathogenic mutations correctly. ５．To interpret model decisions, GOLF incorporates a sparse autoencoder (SAE) that extracts interpretable biochemical features. It reveals that hydrophobic residues often associate with benign predictions, while polar/aromatic residues signal pathogenicity. ６．EVE provides not only a pathogenicity score but also uncertainty estimates per residue, highlighting regions of structural fragility and mutational sensitivity across the OLF domain. ７．A structural map of mutational effects across all 4,959 single-residue substitutions reveals hot spots—especially residues 266–290, 324–334, and 363–394—as regions highly sensitive to variation. ８．The framework reveals that generative models can learn underlying biochemical rules—like polarity and hydrophobic packing—without explicit supervision, suggesting utility in mechanistic variant interpretation. ９．An ensemble of EVE models further improved predictive robustness, reducing initialization bias and enhancing classification consistency across the variant landscape. １０．Limitations include the relatively small number of labeled clinical variants and the current inability to distinguish gain-of-function from loss-of-function effects—an area for future improvement. １１．The authors propose that SAE-derived features can guide future experiments by identifying structurally or biochemically relevant regions, bridging predictive modeling and mechanistic biology. 💻Code: github.com/amirgroup-codes/G… 📜Paper: biorxiv.org/content/10.1101/… #Genomics　#ProteinAI　#VariantInterpretation　#Myocilin　#Glaucoma　#PathogenicityPrediction　#MachineLearning　#SparseAutoencoder　#EvolutionaryBiology　#StructuralBioinformatics

587

BusinessIntelligence

BusinessIntelligence

@bimedotcom

15 Jul 2024

Researchers are figuring out how large language models work economist.com/science-and-te… via @TheEconomist #AI #LLMs #DeepLearning #SparseAutoencoder #Hallucination #Confabulation #SemanticEntropy cc @mvollmer1 @sonu_monika @JagersbergKnut @EstelaMandela @sim010101 @enilev @Shi4Tech @BetaMoroney @PerBBerggreen @sallyeaves @ahier @Corix_JC @HolgerGelhausen @CEO_Aisoma @maponi @FernandaKellner @dinisguarda @NeiraOsci @sulefati7 @theomitsa @SusanHayes_ @tlloydjones @drsharwood @TarakRindani @Nicochan33 @Annerobinsons @rvp @Analytics_699 @TheAIObserverX @AndrewinContact @YvesMulkers @ChuckDBrooks @FractaloidConvo @Khulood_Almani @sminaev2015 @KanezaDiane @CurieuxExplorer @treasadovander @mikeflache @FrRonconi @jeancayeux @LavaletteAstrid @baski_la @trudydarwin @chidambara09 @pchamard @RLDI_Lamy @1OFFGINGER

Researchers are figuring out how large language models work

Such insights could help make them safer, more truthful and easier to use

economist.com

1,969

マシンラーニング集会通称 ML集会【VRChat PC/Quest対応】

マシンラーニング集会通称 ML集会【VRChat PC/Quest対応】@VRC_ML_hangout

28 Jan 2023

2023年01月25日ML集会に行った、 @cehl_teapot (おちゃ)さんによる「SparseAutoEncoderで可視化する特徴量抽出」のLT動画を公開しました。再度見たい方や、見逃した方はこちらよりご覧ください。 youtu.be/QWuzrSXQPzA

2023/01/25LT ML集会「SparseAutoEncoderで可視化する特徴量抽出」by ocha_krg(おちゃ)さん

2023年01月25日ML集会におけるLT ocha_krg(おちゃ)さんより「SparseAutoEncoderで可視化する特徴量...

youtube.com

502

shitake893

shitake893 @shitake8931

25 Jan 2023

マシンラーニング集会、終了しました。 SparseAutoEncoder、なんか重要でない情報は除く手法だったみたい。雑談では相変わらずの主催のげそんさんが、VRChatでの思い出をニューラルネットワークに圧縮して保存するというMADな事をお考えになっていた。

GesonAnko@ML集会

GesonAnko@ML集会 @GesonAnkoVR

25 Jan 2023

今日の @cehl_teapot さんのSparseAutoEncoderのLTはめちゃわりやすかったし実際の実験データをもとに画像の特徴を分解した特徴量が得られていてとても面白かった！ #VRC_ML集会

353

マシンラーニング集会通称 ML集会【VRChat PC/Quest対応】

マシンラーニング集会通称 ML集会【VRChat PC/Quest対応】@VRC_ML_hangout

25 Jan 2023

【本日22時より開催！】今日は @cehl_teapot さんより、「SparseAutoEncoderによる特徴量抽出の可視化」というタイトルでLT会があります！みなさんぜひお気軽にお越しください！ LT会は22:30スタートです。本日のJoin 先はげそん<GesonAnko> です。 #VRC_ML集会

マシンラーニング集会通称 ML集会【VRChat PC/Quest対応】@VRC_ML_hangout

15 Dec 2022

毎週水曜22時より、ML集会を開催しております。・ML集会とは？ ML(機械学習)関連でだべりつつ、ブログやYouTubeを一緒に見て情報共有する会です。皆さんのご参加お待ちしております。 VRCグループ vrchat.com/home/group/grp_05… Discordサーバ discord.gg/6rQ2PZTDqa #VRC_ML集会

1,039