Filter
Exclude
Time range
-
Near
Instrument-agnostic Machine Learning Framework for Accurate Prediction of Functional Groups from Tandem Mass Spectrometry 1. Researchers have developed an innovative machine learning framework that can predict functional groups directly from tandem mass spectrometry (MS/MS) spectra without relying on conventional database searches. This instrument-agnostic approach significantly expands the practical applications of machine learning in mass spectrometry. 2. The framework utilizes data from diverse sources, including the Mass Bank of North America (MoNA) and a custom MS/MS database generated using a high-throughput desorption electrospray ionization (DESI) platform. This showcases the transferability of model predictions across different instruments and experimental conditions. 3. A novel spectral representation standardizes the spectra to the precursor ion m/z, enhancing the model's ability to predict low-frequency functional groups. This method improves the overall performance of the machine learning models, especially for datasets with unique compounds. 4. The study demonstrates that multi-layer perceptron (MLP) neural networks outperform traditional machine learning models such as decision trees, random forests, and support vector machines in predicting functional groups from MS/MS spectra. 5. The robustness and transferability of the framework are validated using blind test sets from different laboratories and instruments (Orbitrap and TOF). The models trained on one dataset successfully predict functional groups in spectra acquired using different instruments, highlighting the framework's versatility. 6. The average molecular F1 score and accuracy for models trained on the MoNA dataset are 87% and 94%, respectively, while for the DESI dataset, they are 76% and 87%. This indicates similar model performance across different databases. 7. The choice of functional groups for prediction is crucial. Certain groups like amines and carbonyls consistently perform well, while others like thiophene perform poorly. This insight can guide future improvements in functional group prediction. 📜Paper: doi.org/10.26434/chemrxiv-20… #MachineLearning #MassSpectrometry #FunctionalGroups #InstrumentAgnostic #PredictiveModeling #ComputationalBiology
4
1,082
Functional Groups are All you Need for Chemically Interpretable Molecular Property Prediction 1. A novel study proposes a novel Functional Group Representation (FGR) framework for molecular property prediction, achieving state-of-the-art performance while ensuring chemical interpretability. This work significantly advances the field by bridging the gap between deep learning models and traditional chemical knowledge. 2. The FGR framework integrates curated functional groups from established chemical literature and mined functional groups from a large molecular corpus. This dual approach provides a comprehensive and interpretable representation of molecular structures, outperforming existing methods on a wide range of benchmark datasets. 3. The study demonstrates that the FGR framework not only matches but often surpasses the performance of current state-of-the-art models in predicting molecular properties across diverse fields such as biophysics, quantum mechanics, and pharmacokinetics. This highlights its potential for accelerating drug discovery and materials science. 4. A key innovation is the use of autoencoders to encode molecules into a lower-dimensional latent space, leveraging pretraining on a large dataset of unlabeled molecules. This allows the model to capture intricate chemical relationships while maintaining simplicity and efficiency. 5. The interpretability of the FGR framework is validated through alignment and uniformity analyses, showing that the model effectively groups molecules with similar functional groups and ensures adequate coverage of chemical space. This is crucial for reliable and generalizable predictions. 6. The study also includes detailed interpretability analyses, demonstrating that the model consistently assigns high attribution scores to chemically meaningful substructures. This provides valuable insights into the structure-property relationships and enhances the trustworthiness of the model for practical applications. 7. The FGR framework is evaluated on several peptide cleavage and bacterial datasets, outperforming graph-based methods and showcasing its scalability and robustness. This suggests its potential for large-scale molecular property prediction tasks. 📜Paper: arxiv.org/abs/2509.09619v1 #MolecularPropertyPrediction #FunctionalGroups #DeepLearning #Interpretability #DrugDiscovery #MaterialsScience
3
15
1,626
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models 1. A new benchmark dataset, FGBench, has been introduced to evaluate the ability of large language models (LLMs) to reason about molecular properties at the functional group level. This is a significant step forward as most existing datasets focus only on molecule-level predictions, overlooking the importance of functional groups. 2. FGBench contains 625K molecular property reasoning problems with detailed functional group annotations and precise positions. It covers 245 different functional groups and includes both regression and classification tasks, making it a comprehensive resource for training and evaluating LLMs in chemistry. 3. The dataset is constructed using a novel validation-by-reconstruction strategy to ensure high-quality molecular comparisons. This method involves removing and replacing functional groups to verify the structural integrity of the modified molecules, a crucial step for accurate reasoning. 4. Benchmarking results on 7K curated data from FGBench show that current LLMs struggle with functional group-level reasoning, highlighting the need for enhanced reasoning capabilities in chemistry-related tasks. This finding underscores the importance of incorporating fine-grained molecular information into LLM training. 5. The study introduces a new data processing pipeline that can be generalized to other molecular property datasets. This pipeline not only improves the quality of the dataset but also provides a framework for generating new question–answer pairs to advance molecular design and drug discovery. 6. The authors anticipate that FGBench will serve as a foundational framework for developing more interpretable and structure-aware LLMs. The dataset's alignment with molecular graphs and structural information also makes it well-suited for multi-modal learning, encouraging the development of multimodal LLMs. 📜Paper: arxiv.org/abs/2508.01055v2 #FGBench #MolecularPropertyReasoning #FunctionalGroups #LargeLanguageModels #Chemistry #Benchmarking #Dataset
3
9
866
Functional Group-Aware Representations for Small Molecules (FARM): A Novel Foundation Model Bridging SMILES, Natural Language, and Molecular Graphs 1. FARM introduces a novel approach to molecular representation by incorporating functional group information directly into SMILES strings, significantly enriching the chemical context and bridging the gap between SMILES and natural language. This innovation allows for more accurate predictions of molecular properties. 2. The model leverages a unique tokenization strategy, using specific tokens like "O_ketone" and "O_hydroxyl" to differentiate oxygen atoms based on their functional groups. This method expands the chemical lexicon, enhancing the model's ability to understand molecular structures at a finer granularity. 3. FARM combines masked language modeling with graph neural networks to capture both atom-level features and the overall molecular topology. By aligning these two perspectives through contrastive learning, FARM creates a unified molecular embedding that integrates detailed chemical context with structural information. 4. Rigorous evaluations on the MoleculeNet dataset demonstrate FARM's state-of-the-art performance, achieving top results on 11 out of 13 tasks. This highlights its strong transfer learning capabilities and potential for applications in drug discovery and pharmaceutical research. 5. The authors collected a diverse dataset from multiple sources, including ChEMBL25 and ZINC15, to ensure comprehensive coverage of chemical space. This dataset supports the model's ability to learn from a wide range of molecular structures and functional groups. 6. FARM's FG-aware tokenization and fragmentation method outperforms traditional BRICS fragmentation, resulting in a more manageable vocabulary size and better performance on downstream tasks. This approach ensures that the model can effectively learn from and generalize across different molecular datasets. 7. The model's architecture includes a functional group knowledge graph that captures both structural and property-based features of functional groups. This graph is used to learn robust embeddings that facilitate link prediction and enhance the model's understanding of molecular interactions. 8. FARM's contrastive learning framework aligns FG-enhanced SMILES representations with FG graph embeddings, creating a unified molecular representation that integrates atom-level details with global molecular topology. This comprehensive approach improves the model's ability to capture chemically meaningful structures. 9. The authors conducted extensive ablation studies, demonstrating that each component of FARM contributes to its overall performance. The integration of functional group information and contrastive learning significantly enhances the model's effectiveness in molecular representation learning. 10. Future work includes incorporating 3D molecular representations to capture stereochemistry and spatial configurations, further improving the model's predictive capabilities. The ultimate goal is to develop a pre-trained atom embedding that parallels the capabilities of pre-trained word embeddings in natural language processing. 📜Paper: arxiv.org/abs/2410.02082v3 #MolecularRepresentation #FunctionalGroups #AIinChemistry #DrugDiscovery #MachineLearning #ContrastiveLearning #MoleculeNet
4
714
MolFCL: Predicting Molecular Properties through Chemistry-Guided Contrastive and Prompt Learning 1/ MolFCL introduces a unique approach to molecular property prediction, combining fragment-based contrastive learning with functional group-based prompt learning, marking a significant advancement in the molecular machine learning field. 2/ The framework integrates fragment-fragment interactions, preserving chemical environments and fragment reactions within the contrastive learning framework, which enhances molecular graph augmentation and preserves essential structural features. 3/ MolFCL introduces a novel functional group prompt learning method, leveraging task-relevant chemical knowledge to guide molecular property predictions, providing interpretable insights into how functional groups influence molecular behavior. 4/ In pre-training, the model uses 250,000 unlabeled molecules from ZINC15 to learn representations that capture both atomic-level structure and fragment-level reactions, offering more precise predictions than traditional approaches. 5/ Compared to state-of-the-art methods like CMPNN and KANO, MolFCL consistently outperforms them in 23 molecular property prediction datasets, demonstrating its superior generalization ability and improved performance on diverse tasks. 6/ The model's interpretability is enhanced by its focus on functional groups, allowing users to understand how different chemical features contribute to molecular properties, which is essential for drug discovery and molecular optimization. 7/ MolFCL is an effective tool for improving the accuracy and efficiency of molecular property predictions, with the potential to accelerate drug design by providing deeper insights into molecular structure-property relationships. 💻Code: github.com/tangxiangcsu/MolF… 📜Paper: doi.org/10.1093/bioinformati… #MachineLearning #MolecularPropertyPrediction #DrugDiscovery #Bioinformatics #AIinChemistry #ContrastiveLearning #MolecularRepresentation #Chemoinformatics #FunctionalGroups #DeepLearning
2
9
1,181
MolFCL: Predicting Molecular Properties through Chemistry-Guided Contrastive and Prompt Learning 1/ MolFCL introduces a unique approach to molecular property prediction, combining fragment-based contrastive learning with functional group-based prompt learning, marking a significant advancement in the molecular machine learning field. 2/ The framework integrates fragment-fragment interactions, preserving chemical environments and fragment reactions within the contrastive learning framework, which enhances molecular graph augmentation and preserves essential structural features. 3/ MolFCL introduces a novel functional group prompt learning method, leveraging task-relevant chemical knowledge to guide molecular property predictions, providing interpretable insights into how functional groups influence molecular behavior. 4/ In pre-training, the model uses 250,000 unlabeled molecules from ZINC15 to learn representations that capture both atomic-level structure and fragment-level reactions, offering more precise predictions than traditional approaches. 5/ Compared to state-of-the-art methods like CMPNN and KANO, MolFCL consistently outperforms them in 23 molecular property prediction datasets, demonstrating its superior generalization ability and improved performance on diverse tasks. 6/ The model's interpretability is enhanced by its focus on functional groups, allowing users to understand how different chemical features contribute to molecular properties, which is essential for drug discovery and molecular optimization. 7/ MolFCL is an effective tool for improving the accuracy and efficiency of molecular property predictions, with the potential to accelerate drug design by providing deeper insights into molecular structure-property relationships. 💻Code: github.com/tangxiangcsu/MolF… 📜Paper: doi.org/10.1093/bioinformati… #MachineLearning #MolecularPropertyPrediction #DrugDiscovery #Bioinformatics #AIinChemistry #ContrastiveLearning #MolecularRepresentation #Chemoinformatics #FunctionalGroups #DeepLearning
1
13
1,239
30 Nov 2023
Our lab @SLFDavos gets busy! What would we do without these smart guys sorting thousands of insects from our malaise traps along an elevational gradient? #insectmonitoring #functionalgroups
24
1,909
Read my latest #research on #phytoplankton #functionalgroups and #extremeevents, published with @SpringerNature in @HYDR_Springer Glad to share the history of the tropical reservoir - Gargalheiras/Brazil rdcu.be/dcemg
3
2
16
878
The surface site interaction point approach to non-covalent interactions - now published in Chemical Society Reviews pubs.rsc.org/en/content/arti… #chemistry #bioinformatics #NoncovalentInteractions #FunctionalGroups

1
2
146
This dataset is also allowing me to gain insights into how heterogeneity changes species occurrences, and to ask questions about interactions across #trophic levels and #functionalgroups 📊
1
4
81 #functionalgroups in #ecopath #ecosym simulations by the @bioweb4 team showstrong decline in fish for #rcp8.5 in 2100 @kueno_de @Thuenen_aktuell #kueste2022
1