Biology AI Daily

Biology AI Daily

Users
Tweets

May 10

Unified Genomic and Chemical Representations Enable Bidirectional Biosynthetic Gene Cluster and Natural Product Retrieval 1. Liu, Li, Ong et al. present BCCoE, a multimodal retrieval framework that puts biosynthetic gene clusters (BGCs) and natural products into a shared embedding space, enabling both directions of search: BGC→compound and compound→BGC. 2. The key idea is to reuse strong pretrained “foundation” embeddings from each modality, then learn only a lightweight alignment: BiGCARP embeddings for BGC Pfam-domain sequences (256D) MoLFormer embeddings for compound SMILES (768D), projected into a 64D co-embedding space for cosine-similarity nearest-neighbor retrieval. 3. Architecture: two modality-specific encoders (same structure, separate weights) that apply (i) linear projection, (ii) a 2-layer transformer encoder, (iii) pooling concatenation with the mean of the original embedding sequence, then (iv) batch norm a 2-layer MLP to output the final co-embedding vectors. 4. Training is metric learning with N-pair loss over batches of paired (BGC, compound) examples from MIBiG; foundation-model embeddings are frozen to reduce overfitting and to preserve general representations. Negatives are implicitly taken from other pairs within the same batch (efficient “in-batch” negatives). 5. Why alignment matters: baselines that do retrieval without cross-modal alignment (KNN and a two-hop KNN-2hop that chains BGC-similarity and compound-similarity) cannot consistently capture genotype–chemotype links, especially when candidate pools include novel items not seen during training. 6. Main quantitative results on MIBiG 4.0 (10-fold CV): for BGC→compound retrieval at top-10, Recall improves from 12.9% (KNN) and 21.9% (KNN-2hop) to 32.9% (BCCoE); for compound→BGC at top-10, BCCoE reaches 65.3% Recall (vs 60.6% KNN-2hop), with very large lift over random guessing at low K. 7. Generalization to unseen product classes (hold out one entire BGC product class during training): performance drops for all methods, but BCCoE remains substantially better, achieving Lift@10 of 17.0 (BGC→compound) and 20.2 (compound→BGC), outperforming KNN-2hop by ~75–89% in lift at top-10. 8. Temporal generalization (train on MIBiG 3.1, evaluate on new links added in MIBiG 4.0): BCCoE improves identification of newly added BGC–compound pairs, e.g., when retrieving compounds from the full MIBiG 4.0 candidate set, top-10 hits rise from 126 (KNN-2hop) to 180 (BCCoE) among 473 new pairs. 9. Robustness across alternative foundation models: swapping in ESM-C for BGCs or Uni-Mol2 for compounds shows BCCoE remains relatively stable, while KNN-2hop can degrade sharply due to “similarity saturation” (cosine similarities clustered near 1 in the initial embedding spaces), which breaks two-hop score ranking; BCCoE’s aligned space yields a more well-behaved similarity distribution. 10. Practical validation beyond MIBiG: on three experimentally validated external BGC–compound pairs previously used in BGC-MAP, BCCoE ranks the true matches much higher in both directions (BGC→compound and compound→BGC), supporting its use for prioritizing candidates in real discovery workflows. 💻Code: zenodo.org/records/18849052 📜Paper: doi.org/10.1038/s41598-026-4… #Bioinformatics #ComputationalBiology #NaturalProducts #GenomeMining #BiosyntheticGeneClusters #MultimodalAI #MetricLearning #RepresentationLearning #Cheminformatics

2,091

David Vivancos - e/acc

David Vivancos - e/acc

@VivancosDavid

22 Apr 2025

✈️🇸🇬 to #ICLR 2025 🔥🔥🔥 at the iconic city of #Singapore participating in The Thirteenth International Conference on Learning Representations, one of the 4 main #machinelearning #ai conferences worldwide, with Dr @josesanchezhb This year with promissing Invited talks by @dawnsongtweets Song-Chun Zhu @danqi_chen @zicokolter @YiMaTweets @_rockt and 44 workshops, 3827 papers, orals, posters, socials and many more, featuring @SchmidhuberAI @SLapuschkin @lifu_huang @Yoshua_Bengio @sea_snell @wellingmax @svlevine @pabbeel to name a very limited few Thanks to the ICLR organizers: @yisongyue @cvondrick @yuqirose @animesh_garg @orussakovsk @pcastr @francescazfl @savvyRL @fredahshi @SchwinnLeo Jonas Köhler and many others, including the 10s of sponsors like: @Microsoft @AIatMeta @Google @amazon @Oracle @Huawei @Apple @UnitreeRobotics and many others for making it possible one more year. See you all! PD: We will be hosting two @_Qubic_ AGI dinners on the 24th & 25th seats are very limited but DM if you are interested #Artificialntelligence #AI #AGI #RepresentationLearning #FeatureLearning #UnsupervisedLearning #SemiSupervisedLearning #SupervisedLearning #MetricLearning #KernelLearning #SparseCoding #DimensionalityExpansion #HierarchicalModels #OptimalTransport #DeepLearningTheory #Planning #ReinforcementLearning #ComputerVision #NLP #AudioProcessing #SpeechRecognition #Robotics #Neuroscience #Biology #ClimateScience #Sustainability #Fairness #AIethics #Safety #Privacy #Interpretability #ExplainableAI #Visualization #Optimization #TheEndOfKnowledge #Artificiology

0:10

4,221

Ram Bhaskara

Ram Bhaskara @RamBhaskara4

13 Mar 2025

Just dropped on bioRxiv! Our team has assembled a data-driven map of the human E3 ligome. Check out our preprint! #E3_ligome #ubiquitination #dataIntegration #metriclearning biorxiv.org/content/10.1101/…

Multi-scale classification decodes the complexity of the human E3 ligome

E3 ubiquitin ligases are key regulators of protein homeostasis, targeting specific proteins for degradation via the ubiquitin-proteasome system (UPS). They provide crucial substrate specificity,...

biorxiv.org

436

Qiita人気記事/執筆者紹介

Qiita人気記事/執筆者紹介 @q_hayari

4 Dec 2023

【Qiita】本日のすごい人 [ Kaggle PyTorch MetricLearning Pytorch-lightning ]に関する記事を書いた @fam_taro さん : qiita.com/fam_taro/items/735…

最近のMetric Learningの始め方（コンペを見据えて） - Qiita

Kaggle Advent Calendar 3日目の記事です。今回はKaggleなどのコンペで Metric Learning を試すときにとりあえず最初に実装するコードをまとめました。 UMAPを使ったembeddingの可視化とか faiss を使った検索とかはこの...

qiita.com

410

ふぁむたろう

ふぁむたろう

@fam_taro

27 Nov 2023

背中を押してもらえたので最近のMetricLearningの始め方について書いてみようと思います（3日目） qiita.com/advent-calendar/20…

Kaggle - Qiita Advent Calendar 2023 - Qiita

Calendar page for Qiita Advent Calendar 2023 regarding Kaggle.

qiita.com

10,418

DeepAI

DeepAI

@DeepAI

29 Apr 2023

🤯Lowkey Goated When #MetricLearning Is The Vibe! Check out this new paper from @Kosyamada et al. for Semantic Frame Induction: deepai.org/publication/seman…

Semantic Frame Induction with Deep Metric Learning

04/27/23 - Recent studies have demonstrated the usefulness of contextualized word embeddings in unsupervised semantic frame induction. Howeve...

deepai.org

1,681

AI in the Life Sciences

AI in the Life Sciences @AILSCI

7 Apr 2023

"Deep Metric Learning for the Classification of MALDI-TOF Spectral Signatures from Multiple Species of Neotropical Disease Vectors" By Merchan et al, Available on @AILSCI! 🔗Find out more here bit.ly/3m753D5 #AI #MetricLearning #NTDs #moleculartaxonomy #compbio

1,069

Qiita人気記事/執筆者紹介

Qiita人気記事/執筆者紹介 @q_hayari

7 Feb 2023

【Qiita】3年前にすごい人 [ Python 機械学習 DeepLearning 論文読み MetricLearning ]に関する記事を書いた @tancoro3 さん : qiita.com/tancoro/items/7ed5…

383

JCIM & JCTC Journals

JCIM & JCTC Journals @JCIM_JCTC

2 Oct 2022

Learning Protein Embedding to Improve Protein Fold Recognition Using Deep #MetricLearning #DeepLearning #ProteinFold pubs.acs.org/doi/10.1021/acs… #current_issue #JCIM #compchem

TechRxiv Preprint Server

TechRxiv Preprint Server @TechRxiv_org

16 Sep 2022

Latent Similarity Identifies Important Functional Connections for Phenotype Prediction #TechRxiv #functionalconnectivity #fMRI #metriclearning #ML #predictionalgorithm #NetworkAnalysis techrxiv.org/articles/prepri…

Javier E. Sanchez-Galan

Javier E. Sanchez-Galan @j_sgalan

28 Jul 2022

Congrats! the paper "Comparing #CNN and #Deep #MetricLearning Methods for #classification of Export Watermelon🍉Varieties" (FID18-060 Project) is out! #Industry40 #ML #deeplearning #embedding @ieee_ies #ISIE2022 @CemcitAip @utpfisc @lsi_utp @UTPInvestiga ieeexplore.ieee.org/document…

Kacper Łukawski

Kacper Łukawski @LukawskiKacper

23 Jun 2022

Replying to @LukawskiKacper @DSS_conference

Great showcases of using #MetricLearning for extreme classification of LEGO bricks by Piotr Rybak and #edtech challenges by @BrainlyTeam 👏

SpringerCompSci

SpringerCompSci @SpringerCompSci

14 Jun 2022

🆕📚 Information Fusion 👉bit.ly/3Q0H16S ✅Reviews state-of-the-art techniques for #informationfusion ✅Presents typical applications; from low-level to high-level tasks #InformationScience #DataFusion #DeepLearning #MetricLearning

Jinxing Li, Bob Zhang, David Zhang: Information Fusion

Reviews state-of-the-art techniques for information fusion

Presents typical applications of information fusion, ranging from low-level to high-level tasks

Demonstrates the benefits of applying advanced techniques in information fusion

ALT Jinxing Li, Bob Zhang, David Zhang: Information Fusion Reviews state-of-the-art techniques for information fusion Presents typical applications of information fusion, ranging from low-level to high-level tasks Demonstrates the benefits of applying advanced techniques in information fusion

Javier E. Sanchez-Galan

Javier E. Sanchez-Galan @j_sgalan

23 May 2022

We are excited to announce that our work "Comparing #ConvNets and #Deep #MetricLearning Methods for Classification of Export Watermelon (Citrullus lanatus🍉) Varieties" (FID18-060 Project) will be presented 3/Jun at @ieee_ies #ISIE2022 #Industry40 #ML #deeplearning #embedding

Qdrant

Qdrant

@qdrant_engine

9 May 2022

Our AI Research Engineer Yusuf Sarıgöz discusses metric learning with @DmitryKan on a Vector Podcast. Check this out youtube.com/watch?v=AU0O_6-E… #Qdrant #MachineLearning #metriclearning #vectorsearch

Qdrant

Qdrant

@qdrant_engine

4 May 2022

Anomaly detection is one of the exciting problems where #metriclearning can demonstrate an advantage over classical approaches. Our #casestudy illustrates how to do this with a practical example of quality control for coffee beans. qdrant.tech/articles/detecti…

Vinayak Nayak

Vinayak Nayak @ElisonSherton

3 May 2022

#MetricLearning is a widely used field for image retrieval problems. Here, I explore the paper #ClassificationIsAStrongBaselineForDeepMetricLearning and implement it on #InshopDataset. 1/n jarvislabs.ai/blogs/vin-metr…

Claire YU

Claire YU @claireyuw

25 Jan 2022

Unveil the mystery of embedding vectors to power semantic similarity search in computer vision by Marie Stephen Leo zilliz.com/learn/embedding-g… #SimilaritySearch #NeuralNetworks #VectorDatabase #MetricLearning #ArtificialIntelligence #MachineLearning #DeepLearning

Powering Semantic Search in Computer Vision with Embeddings - Zilliz Learn

Discover how to extract useful information from unstructured data sources in a scalable manner using embeddings.

zilliz.com

DeepField

DeepField @DeepfieldProj

7 Jan 2022

All set to start the first session of the Thematic Workshop on Deep Learning organized by our partners from Heriot-Watt University #deeplearning #metriclearning #marinerobotics

ODSC (Open Data Science Conference) AI

ODSC (Open Data Science Conference) AI

@_odsc

7 Dec 2021

Unlocking new dimensions in image-generation research with Manifold Matching via Metric Learning. #DataScience #ML #MachineLearning #MetricLearning hubs.li/Q0107kb60