Filter
Exclude
Time range
-
Near
Unified Genomic and Chemical Representations Enable Bidirectional Biosynthetic Gene Cluster and Natural Product Retrieval 1. Liu, Li, Ong et al. present BCCoE, a multimodal retrieval framework that puts biosynthetic gene clusters (BGCs) and natural products into a shared embedding space, enabling both directions of search: BGC→compound and compound→BGC. 2. The key idea is to reuse strong pretrained “foundation” embeddings from each modality, then learn only a lightweight alignment: BiGCARP embeddings for BGC Pfam-domain sequences (256D) MoLFormer embeddings for compound SMILES (768D), projected into a 64D co-embedding space for cosine-similarity nearest-neighbor retrieval. 3. Architecture: two modality-specific encoders (same structure, separate weights) that apply (i) linear projection, (ii) a 2-layer transformer encoder, (iii) pooling concatenation with the mean of the original embedding sequence, then (iv) batch norm a 2-layer MLP to output the final co-embedding vectors. 4. Training is metric learning with N-pair loss over batches of paired (BGC, compound) examples from MIBiG; foundation-model embeddings are frozen to reduce overfitting and to preserve general representations. Negatives are implicitly taken from other pairs within the same batch (efficient “in-batch” negatives). 5. Why alignment matters: baselines that do retrieval without cross-modal alignment (KNN and a two-hop KNN-2hop that chains BGC-similarity and compound-similarity) cannot consistently capture genotype–chemotype links, especially when candidate pools include novel items not seen during training. 6. Main quantitative results on MIBiG 4.0 (10-fold CV): for BGC→compound retrieval at top-10, Recall improves from 12.9% (KNN) and 21.9% (KNN-2hop) to 32.9% (BCCoE); for compound→BGC at top-10, BCCoE reaches 65.3% Recall (vs 60.6% KNN-2hop), with very large lift over random guessing at low K. 7. Generalization to unseen product classes (hold out one entire BGC product class during training): performance drops for all methods, but BCCoE remains substantially better, achieving Lift@10 of 17.0 (BGC→compound) and 20.2 (compound→BGC), outperforming KNN-2hop by ~75–89% in lift at top-10. 8. Temporal generalization (train on MIBiG 3.1, evaluate on new links added in MIBiG 4.0): BCCoE improves identification of newly added BGC–compound pairs, e.g., when retrieving compounds from the full MIBiG 4.0 candidate set, top-10 hits rise from 126 (KNN-2hop) to 180 (BCCoE) among 473 new pairs. 9. Robustness across alternative foundation models: swapping in ESM-C for BGCs or Uni-Mol2 for compounds shows BCCoE remains relatively stable, while KNN-2hop can degrade sharply due to “similarity saturation” (cosine similarities clustered near 1 in the initial embedding spaces), which breaks two-hop score ranking; BCCoE’s aligned space yields a more well-behaved similarity distribution. 10. Practical validation beyond MIBiG: on three experimentally validated external BGC–compound pairs previously used in BGC-MAP, BCCoE ranks the true matches much higher in both directions (BGC→compound and compound→BGC), supporting its use for prioritizing candidates in real discovery workflows. 💻Code: zenodo.org/records/18849052 📜Paper: doi.org/10.1038/s41598-026-4… #Bioinformatics #ComputationalBiology #NaturalProducts #GenomeMining #BiosyntheticGeneClusters #MultimodalAI #MetricLearning #RepresentationLearning #Cheminformatics
3
14
2,091
✈️🇸🇬 to #ICLR 2025 🔥🔥🔥 at the iconic city of #Singapore participating in The Thirteenth International Conference on Learning Representations, one of the 4 main #machinelearning #ai conferences worldwide, with Dr @josesanchezhb This year with promissing Invited talks by @dawnsongtweets Song-Chun Zhu @danqi_chen @zicokolter @YiMaTweets @_rockt and 44 workshops, 3827 papers, orals, posters, socials and many more, featuring @SchmidhuberAI @SLapuschkin @lifu_huang @Yoshua_Bengio @sea_snell @wellingmax @svlevine @pabbeel to name a very limited few Thanks to the ICLR organizers: @yisongyue @cvondrick @yuqirose @animesh_garg @orussakovsk @pcastr @francescazfl @savvyRL @fredahshi @SchwinnLeo Jonas Köhler and many others, including the 10s of sponsors like: @Microsoft @AIatMeta @Google @amazon @Oracle @Huawei @Apple @UnitreeRobotics and many others for making it possible one more year. See you all! PD: We will be hosting two @_Qubic_ AGI dinners on the 24th & 25th seats are very limited but DM if you are interested #Artificialntelligence #AI #AGI #RepresentationLearning #FeatureLearning #UnsupervisedLearning #SemiSupervisedLearning #SupervisedLearning #MetricLearning #KernelLearning #SparseCoding #DimensionalityExpansion #HierarchicalModels #OptimalTransport #DeepLearningTheory #Planning #ReinforcementLearning #ComputerVision #NLP #AudioProcessing #SpeechRecognition #Robotics #Neuroscience #Biology #ClimateScience #Sustainability #Fairness #AIethics #Safety #Privacy #Interpretability #ExplainableAI #Visualization #Optimization #TheEndOfKnowledge #Artificiology
7
28
99
4,221
背中を押してもらえたので最近のMetricLearningの始め方について書いてみようと思います(3日目) qiita.com/advent-calendar/20…
1
4
70
10,418
"Deep Metric Learning for the Classification of MALDI-TOF Spectral Signatures from Multiple Species of Neotropical Disease Vectors" By Merchan et al, Available on @AILSCI! 🔗Find out more here bit.ly/3m753D5 #AI #MetricLearning #NTDs #moleculartaxonomy #compbio
4
9
1,069
【Qiita】3年前にすごい人 [ Python 機械学習 DeepLearning 論文読み MetricLearning ]に関する記事を書いた @tancoro3 さん : qiita.com/tancoro/items/7ed5…

4
383
Learning Protein Embedding to Improve Protein Fold Recognition Using Deep #MetricLearning #DeepLearning #ProteinFold pubs.acs.org/doi/10.1021/acs… #current_issue #JCIM #compchem
3
21
Latent Similarity Identifies Important Functional Connections for Phenotype Prediction #TechRxiv #functionalconnectivity #fMRI #metriclearning #ML #predictionalgorithm #NetworkAnalysis techrxiv.org/articles/prepri…

1
3
Congrats! the paper "Comparing #CNN and #Deep #MetricLearning Methods for #classification of Export Watermelon🍉Varieties" (FID18-060 Project) is out! #Industry40 #ML #deeplearning #embedding @ieee_ies #ISIE2022 @CemcitAip @utpfisc @lsi_utp @UTPInvestiga ieeexplore.ieee.org/document…

1
5
24
Great showcases of using #MetricLearning for extreme classification of LEGO bricks by Piotr Rybak and #edtech challenges by @BrainlyTeam 👏
2
🆕📚 Information Fusion 👉bit.ly/3Q0H16S ✅Reviews state-of-the-art techniques for #informationfusion ✅Presents typical applications; from low-level to high-level tasks #InformationScience #DataFusion #DeepLearning #MetricLearning
4
We are excited to announce that our work "Comparing #ConvNets and #Deep #MetricLearning Methods for Classification of Export Watermelon (Citrullus lanatus🍉) Varieties" (FID18-060 Project) will be presented 3/Jun at @ieee_ies #ISIE2022 #Industry40 #ML #deeplearning #embedding
6
20
9 May 2022
Our AI Research Engineer Yusuf Sarıgöz discusses metric learning with @DmitryKan on a Vector Podcast. Check this out youtube.com/watch?v=AU0O_6-E… #Qdrant #MachineLearning #metriclearning #vectorsearch

1
9
4 May 2022
Anomaly detection is one of the exciting problems where #metriclearning can demonstrate an advantage over classical approaches. Our #casestudy illustrates how to do this with a practical example of quality control for coffee beans. qdrant.tech/articles/detecti…
6
14
#MetricLearning is a widely used field for image retrieval problems. Here, I explore the paper #ClassificationIsAStrongBaselineForDeepMetricLearning and implement it on #InshopDataset. 1/n jarvislabs.ai/blogs/vin-metr…

1
1
6
All set to start the first session of the Thematic Workshop on Deep Learning organized by our partners from Heriot-Watt University #deeplearning #metriclearning #marinerobotics
4
Unlocking new dimensions in image-generation research with Manifold Matching via Metric Learning. #DataScience #ML #MachineLearning #MetricLearning hubs.li/Q0107kb60

1