Earthquakes

Earthquakes

115 Photos and videos

Tweets

Biomolecules Papers retweeted

Earthquakes

@NewEarthquake

May 28

Revised: M5.2 → M5.6 68 km SE of Adak, Alaska May 27 15:25 local time (6m ago) Depth 23 km earthquake.usgs.gov/earthqua…

5,169

Biomolecules Papers

Biomolecules Papers @Pastel

May 22

Atom-level Protein Representation Learning Improves Protein Structure Prediction Taewon Kim, Hyosoon Jang, Hyunjin Seo, Seonghwan Seo, Hyeongwoo Kim, Wonho Zhung, Mingyeong Shin, Wooyoun Kim, Sungsoo Ahn arxiv.org/abs/2605.22133 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚜.𝙰𝙸]

Recent advances in generative modeling show that pretrained representations can improve generation as conditioning features or alignment targets. Motivated by this, we study protein representations for predicting structures beyond conventional function annotation. We propose TriProRep, a structure-aware pretraining method that jointly models three aligned residue-level views: amino-acid identity, backbone geometry, and local full-atom geometry, discretely encoded via VQ-VAE tokenizers. By pretraining to recover original tokens from generator-corrupted views, TriProRep learns to distinguish plausible but incorrect cross-view augmentations from the original protein. We further introduce RepSP, a benchmark for evaluating protein representations in structure-predictive settings. RepSP tests three uses of representations: homodimer co-folding from apo-chain representations, residue-level prediction of homodimer-derived interaction properties, and representation-aligned monomer structure pre

ALT Recent advances in generative modeling show that pretrained representations can improve generation as conditioning features or alignment targets. Motivated by this, we study protein representations for predicting structures beyond conventional function annotation. We propose TriProRep, a structure-aware pretraining method that jointly models three aligned residue-level views: amino-acid identity, backbone geometry, and local full-atom geometry, discretely encoded via VQ-VAE tokenizers. By pretraining to recover original tokens from generator-corrupted views, TriProRep learns to distinguish plausible but incorrect cross-view augmentations from the original protein. We further introduce RepSP, a benchmark for evaluating protein representations in structure-predictive settings. RepSP tests three uses of representations: homodimer co-folding from apo-chain representations, residue-level prediction of homodimer-derived interaction properties, and representation-aligned monomer structure pre

364

Biomolecules Papers

Biomolecules Papers @Pastel

May 20

Deep-time consistency in proteome elemental composition across cellular and viral life L. Felipe Benites, Louie Slocombe, Sara I. Walker arxiv.org/abs/2605.19333 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚚-𝚋𝚒𝚘.𝙿𝙴]

Proteins are constructed from a limited alphabet of 20 amino acids, yet the origins and selection of this specific alphabet are unresolved. One largely overlooked aspect is whether elemental composition constrains the range of viable proteomes. Here, we analyze the elemental composition of thousands of proteomes spanning cellular domains and viral realms. Despite evolutionary divergence and orders-of-magnitude variation in proteome size and gene content, proteomes exhibit strikingly consistent elemental composition. This consistency is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone. Viral proteomes occupy the same elemental composition space observed in cellular organisms despite the absence of a single viral common ancestor, suggesting common biochemical constraints shape proteome organization across life. To investigate the evolutionary origins of

ALT Proteins are constructed from a limited alphabet of 20 amino acids, yet the origins and selection of this specific alphabet are unresolved. One largely overlooked aspect is whether elemental composition constrains the range of viable proteomes. Here, we analyze the elemental composition of thousands of proteomes spanning cellular domains and viral realms. Despite evolutionary divergence and orders-of-magnitude variation in proteome size and gene content, proteomes exhibit strikingly consistent elemental composition. This consistency is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone. Viral proteomes occupy the same elemental composition space observed in cellular organisms despite the absence of a single viral common ancestor, suggesting common biochemical constraints shape proteome organization across life. To investigate the evolutionary origins of

Biomolecules Papers

Biomolecules Papers @Pastel

May 20

Elemental Stoichiometry as an Ecological Biosignature with Applications to Life Detection Pilar C. Vergeli, Cole Mathis, John F. Malloy, L. Felipe Benites, Christopher P. Kempes, … arxiv.org/abs/2605.19252 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚊𝚜𝚝𝚛𝚘-𝚙𝚑.𝙴𝙿 𝚚-𝚋𝚒𝚘.𝙼𝙽]

The vast chemical space of possible small molecules, estimated at 10⁶0 compounds for molecules composed of just C, N, O, and S, is only sparsely occupied by biology. We propose that where life selects molecules within this space constitutes a detectable ecological signature: a fingerprint not of specific compounds, but of the statistical structure of elemental composition across molecules sam-pled from ecological systems. Here we introduce a framework combining Van Krevelen diagrams and element scaling laws to characterize the elemental composition of regions of chemical space occupied by biological systems and contrast them with other chemical systems. Applying this framework to 11,834 microbial metagenomic samples, we show that microbial metabolisms occupy a region of chemical space, which is enriched in heteroatoms such as P, S, N, and O relative to C, shifted toward higher O:C and H:C ratios. We observe sublinear element scaling with system size, yielding insights into how elementa

ALT The vast chemical space of possible small molecules, estimated at 10⁶0 compounds for molecules composed of just C, N, O, and S, is only sparsely occupied by biology. We propose that where life selects molecules within this space constitutes a detectable ecological signature: a fingerprint not of specific compounds, but of the statistical structure of elemental composition across molecules sam-pled from ecological systems. Here we introduce a framework combining Van Krevelen diagrams and element scaling laws to characterize the elemental composition of regions of chemical space occupied by biological systems and contrast them with other chemical systems. Applying this framework to 11,834 microbial metagenomic samples, we show that microbial metabolisms occupy a region of chemical space, which is enriched in heteroatoms such as P, S, N, and O relative to C, shifted toward higher O:C and H:C ratios. We observe sublinear element scaling with system size, yielding insights into how elementa

Biomolecules Papers

Biomolecules Papers @Pastel

May 19

MoleCode unlocks structural intelligence in large language models Zhiyuan Yan, Chen Liu, Boxuan Zhao, Kaiqing Lin, Jixiang Zhao, Yimi Wang, Liuzhenghao Lv, Hao Li, Shanzhuo Zhang, Li Yuan, Fanyang Mo arxiv.org/abs/2605.16480 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚜.𝙰𝙸]

Molecules are graphs, but large language models (LLMs) are usually asked to reason about them through linear strings. The most popular molecular representation, SMILES, compresses atoms, bonds, branches and rings into a compact sequence in which topology is implicit, forcing LLMs to reconstruct molecular structure before performing the requested chemical operation. Here we introduce MoleCode, an LLM-native, training-free, graph-explicit molecular language in which all molecular components are represented as typed entities with persistent identifiers and explicit relations. MoleCode makes molecular topology directly readable, editable and auditable within the language context, allowing an LLM to operate on structure rather than recover it from syntax. Across molecular reasoning, editing, generation and analysis tasks, this representational shift improves frontier LLMs most strongly when structural access is limiting: unfamiliar molecules, topology-sensitive operations, larger structures

ALT Molecules are graphs, but large language models (LLMs) are usually asked to reason about them through linear strings. The most popular molecular representation, SMILES, compresses atoms, bonds, branches and rings into a compact sequence in which topology is implicit, forcing LLMs to reconstruct molecular structure before performing the requested chemical operation. Here we introduce MoleCode, an LLM-native, training-free, graph-explicit molecular language in which all molecular components are represented as typed entities with persistent identifiers and explicit relations. MoleCode makes molecular topology directly readable, editable and auditable within the language context, allowing an LLM to operate on structure rather than recover it from syntax. Across molecular reasoning, editing, generation and analysis tasks, this representational shift improves frontier LLMs most strongly when structural access is limiting: unfamiliar molecules, topology-sensitive operations, larger structures

293

Biomolecules Papers

Biomolecules Papers @Pastel

May 15

Detection of residual native state entropy changes upon mutation in Fyn SH3 Kresten Lindorff-Larsen, Robert B. Best, Anthony Mittermaier, Lewis E. Kay, Christopher M. Dobson, Michele Vendruscolo arxiv.org/abs/2605.14496 [𝚚-𝚋𝚒𝚘.𝙱𝙼]

NMR relaxation experiments have shown that there are small but measurable changes in the native state dynamics of the Fyn SH3 domain associated with the substitution by other amino acids of a phenylalanine residue (F20) in the hydrophobic core. We have here used experimental values of NMR order parameters for the wild type protein and two mutational variants (F20L and F20V) as restraints in molecular dynamics simulations. This approach is highly sensitive and provides an atomistic description of the subtle perturbations in native state fluctuations accompanying the mutations. The structural ensembles that we have determined using this method allow the changes in the native state entropy of the protein caused by each of the mutations to be estimated. These entropy changes correspond to free energy variations of several kcal/mol and therefore represent sizable contributions to the overall changes in stability that are associated with the amino acid mutations.

ALT NMR relaxation experiments have shown that there are small but measurable changes in the native state dynamics of the Fyn SH3 domain associated with the substitution by other amino acids of a phenylalanine residue (F20) in the hydrophobic core. We have here used experimental values of NMR order parameters for the wild type protein and two mutational variants (F20L and F20V) as restraints in molecular dynamics simulations. This approach is highly sensitive and provides an atomistic description of the subtle perturbations in native state fluctuations accompanying the mutations. The structural ensembles that we have determined using this method allow the changes in the native state entropy of the protein caused by each of the mutations to be estimated. These entropy changes correspond to free energy variations of several kcal/mol and therefore represent sizable contributions to the overall changes in stability that are associated with the amino acid mutations.

Biomolecules Papers

Biomolecules Papers @Pastel

May 15

Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction Charles B Reilly arxiv.org/abs/2605.13899 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚚-𝚋𝚒𝚘.𝚀𝙼]

Protein function prediction is dominated by representations grounded in sequence and static structure, neither of which captures the collective vibrational dynamics through which proteins act. Here we introduce frequency-space mechanics, a representational framework in which a protein is encoded as a mechanical harmonics graph (MHG): nodes are vibrational modes derived from molecular dynamics, and edges are harmonic couplings weighted by octave alignment between mode frequencies. The representation is coordinate-free, sequence-independent, scale-invariant, and inhabits a latent mechanical space in which the original atomic coordinates have been projected out. The same construction applies to any system with a tractable eigendecomposition. Trained on 5,238 SwissProt proteins under a strict 30% sequence-identity split and using no sequence information, a graph neural network over static MHGs predicts GO molecular function terms across the ontology, demonstrating that vibrational physics

ALT Protein function prediction is dominated by representations grounded in sequence and static structure, neither of which captures the collective vibrational dynamics through which proteins act. Here we introduce frequency-space mechanics, a representational framework in which a protein is encoded as a mechanical harmonics graph (MHG): nodes are vibrational modes derived from molecular dynamics, and edges are harmonic couplings weighted by octave alignment between mode frequencies. The representation is coordinate-free, sequence-independent, scale-invariant, and inhabits a latent mechanical space in which the original atomic coordinates have been projected out. The same construction applies to any system with a tractable eigendecomposition. Trained on 5,238 SwissProt proteins under a strict 30% sequence-identity split and using no sequence information, a graph neural network over static MHGs predicts GO molecular function terms across the ontology, demonstrating that vibrational physics

Biomolecules Papers

Biomolecules Papers @Pastel

May 14

Predicting Endocrine Disruptors: A Deep Learning QSAR Model for Estrogen Receptor Activity Belaguppa Manjunath Ashwin Desai, Shreyas Murthy, Bhoomika Sridhar, Anirudh Belaguppa Manjunath, Vivien Humtsoe, Pronama Biswas arxiv.org/abs/2605.13364 [𝚚-𝚋𝚒𝚘.𝙱𝙼]

Endocrine-disrupting chemicals (EDCs) threaten human health, ecosystems, and biodiversity by interfering with hormonal signaling pathways conserved across vertebrates. Traditional in vivo assays are costly and time-consuming, limiting their capacity to screen the growing number of chemicals. To address this, we developed a deep learning-based QSAR model to predict estrogen receptor (ER) binding molecules. Using a curated dataset of 224 compounds and 2,944 molecular descriptors and fingerprints, a deep neural network (DNN) incorporating dropout and batch normalization was trained and validated. The model achieved training and test accuracies of 96.65% and 91.30%, respectively, with an ROC-AUC of 0.81, a precision of 0.82, and a recall of 0.88 for the active class. Molecular docking against estrogen receptor (PDB ID: 5TOA) confirmed that several predicted compounds exhibited binding comparable to Estradiol, sharing key interactions. This model enables rapid screening of potential EDCs, s

ALT Endocrine-disrupting chemicals (EDCs) threaten human health, ecosystems, and biodiversity by interfering with hormonal signaling pathways conserved across vertebrates. Traditional in vivo assays are costly and time-consuming, limiting their capacity to screen the growing number of chemicals. To address this, we developed a deep learning-based QSAR model to predict estrogen receptor (ER) binding molecules. Using a curated dataset of 224 compounds and 2,944 molecular descriptors and fingerprints, a deep neural network (DNN) incorporating dropout and batch normalization was trained and validated. The model achieved training and test accuracies of 96.65% and 91.30%, respectively, with an ROC-AUC of 0.81, a precision of 0.82, and a recall of 0.88 for the active class. Molecular docking against estrogen receptor (PDB ID: 5TOA) confirmed that several predicted compounds exhibited binding comparable to Estradiol, sharing key interactions. This model enables rapid screening of potential EDCs, s

Biomolecules Papers

Biomolecules Papers @Pastel

May 12

Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation Nabin Giri, Steven Farrell, Kristofer E. Bouchard arxiv.org/abs/2605.09981 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚜.𝙰𝙸]

Multimodal models that jointly reason over protein sequences, structures, and function annotations within a unified representation hold immense potential for integrating multimodal data and generating new proteins with designed functional properties. To utilize transformer architectures, such models require a tokenizer that converts protein structure from continuous atomic coordinates into discrete representations suitable for scalable multimodal training. The quality of such models are fundamentally upper bounded by the fidelity and expressiveness of the underlying tokenized structure. However, existing tokenizers prioritize reconstruction over generative abilities. To address these gaps, we introduce Yeti, a simple and compact protein structure tokenizer based on lookup free quantization and trained end to end with a flow matching objective for multimodal learning. Compared to existing models, Yeti generally achieves the best codebook utilization and token diversity, and second best

ALT Multimodal models that jointly reason over protein sequences, structures, and function annotations within a unified representation hold immense potential for integrating multimodal data and generating new proteins with designed functional properties. To utilize transformer architectures, such models require a tokenizer that converts protein structure from continuous atomic coordinates into discrete representations suitable for scalable multimodal training. The quality of such models are fundamentally upper bounded by the fidelity and expressiveness of the underlying tokenized structure. However, existing tokenizers prioritize reconstruction over generative abilities. To address these gaps, we introduce Yeti, a simple and compact protein structure tokenizer based on lookup free quantization and trained end to end with a flow matching objective for multimodal learning. Compared to existing models, Yeti generally achieves the best codebook utilization and token diversity, and second best

Biomolecules Papers

Biomolecules Papers @Pastel

May 12

TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation Hanqun Cao, Aastha Pal, Sophia Tang, Yinuo Zhang, Jingjie Zhang, Pheng Ann Heng, Pranam Chatterjee arxiv.org/abs/2605.09810 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚜.𝙻𝙶]

Protein function is often controlled by ligands that bias the direction of state transitions, such as agonists and antagonists, rather than stabilizing a single conformation. This is especially important for clinically relevant G protein-coupled receptors (GPCRs), where therapeutic efficacy depends on functional directionality. Structure-based design methods optimize binding to static conformations and cannot represent non-reversible, directional effects or systematically distinguish agonist from antagonist behavior. To address this gap, we introduce Transition-Directed Discrete Diffusion for Allosteric Binder Design (TD3B), a sequence-based generative framework that designs binders with specified agonist or antagonist behavior via a directional transition control objective. TD3B combines a target-aware Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model, enabling targeted agonist and antagonist generation decoupled from b

ALT Protein function is often controlled by ligands that bias the direction of state transitions, such as agonists and antagonists, rather than stabilizing a single conformation. This is especially important for clinically relevant G protein-coupled receptors (GPCRs), where therapeutic efficacy depends on functional directionality. Structure-based design methods optimize binding to static conformations and cannot represent non-reversible, directional effects or systematically distinguish agonist from antagonist behavior. To address this gap, we introduce Transition-Directed Discrete Diffusion for Allosteric Binder Design (TD3B), a sequence-based generative framework that designs binders with specified agonist or antagonist behavior via a directional transition control objective. TD3B combines a target-aware Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model, enabling targeted agonist and antagonist generation decoupled from b

133

Biomolecules Papers

Biomolecules Papers @Pastel

May 12

A putative, computationally stable structure of homotrimeric BP180/collagen XVII Congzhou M Sha arxiv.org/abs/2605.08953 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚙𝚑𝚢𝚜𝚒𝚌𝚜.𝚋𝚒𝚘-𝚙𝚑 𝚙𝚑𝚢𝚜𝚒𝚌𝚜.𝚌𝚘𝚖𝚙-𝚙𝚑 𝚚-𝚋𝚒𝚘.𝚀𝙼]

Background: BP180, also known as collagen XVII and BPAG2 (bullous pemphigoid antigen 2), is a 180-kDa transmembrane protein within the hemidesmosomal plaque complex, and which is known to be a major antigen in bullous pemphigoid, gestational pemphigoid, cicatricial (mucous membrane) pemphigoid, and linear IgA bullous disease. Objective: At present, the 3D structure of BP180 is not known. The goal is to predict a reasonable structure for BP180 through machine learning and molecular dynamics. Methods: In this work, we use the recent Boltz-2 model to predict a putative structure for the intracellular, transmembrane, and proximal extracellular domains, including the NC16A antigenic region and a portion of its first extracellular collagenous domain, Col-15. We computationally embed BP180 in a simple phospholipid bilayer, demonstrate that the putative structure is stable using molecular dynamics, and analyze its allosteric properties. Results: The structures presented satisfy symmetry and se

ALT Background: BP180, also known as collagen XVII and BPAG2 (bullous pemphigoid antigen 2), is a 180-kDa transmembrane protein within the hemidesmosomal plaque complex, and which is known to be a major antigen in bullous pemphigoid, gestational pemphigoid, cicatricial (mucous membrane) pemphigoid, and linear IgA bullous disease. Objective: At present, the 3D structure of BP180 is not known. The goal is to predict a reasonable structure for BP180 through machine learning and molecular dynamics. Methods: In this work, we use the recent Boltz-2 model to predict a putative structure for the intracellular, transmembrane, and proximal extracellular domains, including the NC16A antigenic region and a portion of its first extracellular collagenous domain, Col-15. We computationally embed BP180 in a simple phospholipid bilayer, demonstrate that the putative structure is stable using molecular dynamics, and analyze its allosteric properties. Results: The structures presented satisfy symmetry and se

Biomolecules Papers

Biomolecules Papers @Pastel

May 11

CA-DEL: An Open Multi-Target, Multi-Modal Benchmark for Learning from DNA-Encoded Library Screens Mutian He, Hanqun Cao, Cheng Tan, Zijun Gao, Xiaojun Yao, Chunbin Gu, Pheng-Ann Heng arxiv.org/abs/2605.07439 [𝚚-𝚋𝚒𝚘.𝙱𝙼]

The success of machine learning in drug discovery hinges on learning the relationship between a chemical structure and its biological activity. While DNA-Encoded Library (DEL) technology can generate the massive datasets required for this task, its primary signal – sequencing read counts – is an indirect and often noisy proxy for true molecular binding affinity. To address the scarcity of public benchmarks for developing robust models that can overcome this data challenge, we introduce CA-DEL, a multi-dimensional public benchmark featuring screens against three homologous carbonic anhydrase isoforms. While recent benchmarks like KinDEL have introduced 3D poses for kinase targets, CA-DEL distinguishes itself by focusing on the selectivity challenge among homologous Carbonic Anhydrase isoforms (CAII, CAIX, CAXII). Unlike benchmarks relying solely on noisy enrichment scores, CA-DEL integrates a rigorous validation set of experimentally determined binding affinities (Kᵢ) from ChEMBL, estab

ALT The success of machine learning in drug discovery hinges on learning the relationship between a chemical structure and its biological activity. While DNA-Encoded Library (DEL) technology can generate the massive datasets required for this task, its primary signal – sequencing read counts – is an indirect and often noisy proxy for true molecular binding affinity. To address the scarcity of public benchmarks for developing robust models that can overcome this data challenge, we introduce CA-DEL, a multi-dimensional public benchmark featuring screens against three homologous carbonic anhydrase isoforms. While recent benchmarks like KinDEL have introduced 3D poses for kinase targets, CA-DEL distinguishes itself by focusing on the selectivity challenge among homologous Carbonic Anhydrase isoforms (CAII, CAIX, CAXII). Unlike benchmarks relying solely on noisy enrichment scores, CA-DEL integrates a rigorous validation set of experimentally determined binding affinities (Kᵢ) from ChEMBL, estab

118

Biomolecules Papers

Biomolecules Papers @Pastel

May 9

Enhancing Cryo-EM Density Map Segmentation in Phenix for Improved Atomic Model Building Chenwei Zhang arxiv.org/abs/2605.05259 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚘𝚗𝚍-𝚖𝚊𝚝.𝚖𝚝𝚛𝚕-𝚜𝚌𝚒 𝚌𝚜.𝙰𝙸 𝚚-𝚋𝚒𝚘.𝚀𝙼]

We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and artifacts that traditionally hinder this step. Our results demonstrate PhenixCraft's superior performance in TM-scores and sequence accuracy, significantly improving upon the limitations and inefficiencies of traditional model building using Phenix.

ALT We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and artifacts that traditionally hinder this step. Our results demonstrate PhenixCraft's superior performance in TM-scores and sequence accuracy, significantly improving upon the limitations and inefficiencies of traditional model building using Phenix.

Biomolecules Papers

Biomolecules Papers @Pastel

May 7

Benchmarking open-source tools for in silico antiviral drug discovery Daniel C. Elton, Preston W. Estep arxiv.org/abs/2605.04265 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚚-𝚋𝚒𝚘.𝚀𝙼]

Antivirals are uniquely positioned to be deployed quickly during a new outbreak, especially when repurposed from approved drugs. Yet there are no FDA-approved antivirals for the majority of viral families with pandemic potential. Here we lay out the case for investing in technologies and techniques for antiviral drug discovery and designing antiviral combinations. We present a survey of open source datasets and computational tools for in silico antiviral drug discovery, with a particular focus on the latest AI-based systems and docking tools. We then present our custom dataset of 43,005 viral protein-ligand binding measurements that we curated from BindingDB and other sources. Importantly, we found that 31% of viral protein binding data in BindingDB required polyprotein sequences to be carefully split before the data were suitable for training or testing ML models. Using our custom dataset we fine-tuned the DrugFormDTA binding affinity prediction model (Khokhlov et al. 2025). We then b

ALT Antivirals are uniquely positioned to be deployed quickly during a new outbreak, especially when repurposed from approved drugs. Yet there are no FDA-approved antivirals for the majority of viral families with pandemic potential. Here we lay out the case for investing in technologies and techniques for antiviral drug discovery and designing antiviral combinations. We present a survey of open source datasets and computational tools for in silico antiviral drug discovery, with a particular focus on the latest AI-based systems and docking tools. We then present our custom dataset of 43,005 viral protein-ligand binding measurements that we curated from BindingDB and other sources. Importantly, we found that 31% of viral protein binding data in BindingDB required polyprotein sequences to be carefully split before the data were suitable for training or testing ML models. Using our custom dataset we fine-tuned the DrugFormDTA binding affinity prediction model (Khokhlov et al. 2025). We then b

Biomolecules Papers

Biomolecules Papers @Pastel

May 6

AgenticPosesRanker: An Agentic AI Framework for Physically Grounded Ranking of Protein-Ligand Docking Poses Sofiene Khiari, Amr H. Mahmoud, Markus A. Lill arxiv.org/abs/2605.03707 [𝚚-𝚋𝚒𝚘.𝙱𝙼]

Scoring functions remain the principal bottleneck in molecular docking: they routinely fail to rank near-native poses above decoys, and their composite single-score design obscures the physicochemical basis of each ranking error. We present AgenticPosesRanker, an agentic AI framework that combines six deterministic, physically grounded analysis tools (interaction fingerprinting, solvent-accessible burial, conformational strain, steric-clash detection, unsatisfied-polar-atom penalty, and chemical-identity extraction) with large-language-model (GPT-5) chain-of-thought reasoning to evaluate and rank docking poses. On a curated benchmark of ten protein-ligand systems (162 poses) balanced by construction between Smina scoring-function successes and failures, the agent achieved 50.0% best-pose accuracy, matching the design-fixed Smina baseline of 50.0% and significantly exceeding a 7.7% uniformly random baseline (p < 0.001, one-sided exact binomial test). The balanced-benchmark accuracy deco

ALT Scoring functions remain the principal bottleneck in molecular docking: they routinely fail to rank near-native poses above decoys, and their composite single-score design obscures the physicochemical basis of each ranking error. We present AgenticPosesRanker, an agentic AI framework that combines six deterministic, physically grounded analysis tools (interaction fingerprinting, solvent-accessible burial, conformational strain, steric-clash detection, unsatisfied-polar-atom penalty, and chemical-identity extraction) with large-language-model (GPT-5) chain-of-thought reasoning to evaluate and rank docking poses. On a curated benchmark of ten protein-ligand systems (162 poses) balanced by construction between Smina scoring-function successes and failures, the agent achieved 50.0% best-pose accuracy, matching the design-fixed Smina baseline of 50.0% and significantly exceeding a 7.7% uniformly random baseline (p < 0.001, one-sided exact binomial test). The balanced-benchmark accuracy deco

Biomolecules Papers

Biomolecules Papers @Pastel

May 1

Complex Effects of Salt on Small-Angle X-ray Scattering of BSA Originate From the Interplay of Ions and Hydration Water Anshika Dhiman, Sanbo Qin, Huan-Xiang Zhou arxiv.org/abs/2604.27913 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚘𝚗𝚍-𝚖𝚊𝚝.𝚜𝚘𝚏𝚝]

Salts are an integral part of the environment for living systems and, therefore, understanding their effects on proteins and other biomolecules is of fundamental interest. Small-angle X-ray scattering (SAXS) of protein solutions can provide valuable information on salt effects, but extracting this information has been a significant challenge. For example, SAXS data of bovine serum albumin (BSA) at various salt concentrations were fit to three different spherical models. Here we combined the newly developed FMAPIq approach with explicit-solvent all-atom molecular dynamics simulations to show that the complex effects of salt on the SAXS of BSA originate from the interplay of ions and hydration water, leading to a general picture of protein-ion-water interactions.

ALT Salts are an integral part of the environment for living systems and, therefore, understanding their effects on proteins and other biomolecules is of fundamental interest. Small-angle X-ray scattering (SAXS) of protein solutions can provide valuable information on salt effects, but extracting this information has been a significant challenge. For example, SAXS data of bovine serum albumin (BSA) at various salt concentrations were fit to three different spherical models. Here we combined the newly developed FMAPIq approach with explicit-solvent all-atom molecular dynamics simulations to show that the complex effects of salt on the SAXS of BSA originate from the interplay of ions and hydration water, leading to a general picture of protein-ion-water interactions.

Biomolecules Papers

Biomolecules Papers @Pastel

Apr 29

Learning Structure, Energy, and Dynamics: A Survey of Artificial Intelligence for Protein Dynamics Haocheng Tang, Liang Shi, Ya-Shi Zhang, Xixian Liu, Jian Tang, Jiarui Lu arxiv.org/abs/2604.25244 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚜.𝙻𝙶]

Protein dynamics underlie many biological functions, yet remain difficult to characterize due to the high computational cost of molecular dynamics simulations and the scarcity of dynamic structural data. This survey reviews recent advances in artificial intelligence for protein dynamics from three perspectives: learning from structural ensembles and trajectories, learning from physical energy signals, and learning to accelerate molecular simulations. We summarize representative methods for conformation ensemble generation, trajectory generation, Boltzmann generators, physics-aware adaptation, machine learning potentials, coarse-grained modeling, and collective variable discovery. We further discuss available datasets and key open challenges, such as scalability, thermodynamic consistency, kinetic fidelity, and integration with experimental constraints.

ALT Protein dynamics underlie many biological functions, yet remain difficult to characterize due to the high computational cost of molecular dynamics simulations and the scarcity of dynamic structural data. This survey reviews recent advances in artificial intelligence for protein dynamics from three perspectives: learning from structural ensembles and trajectories, learning from physical energy signals, and learning to accelerate molecular simulations. We summarize representative methods for conformation ensemble generation, trajectory generation, Boltzmann generators, physics-aware adaptation, machine learning potentials, coarse-grained modeling, and collective variable discovery. We further discuss available datasets and key open challenges, such as scalability, thermodynamic consistency, kinetic fidelity, and integration with experimental constraints.

102

Biomolecules Papers

Biomolecules Papers @Pastel

Apr 21

ConforNets: Latents-Based Conformational Control in OpenFold3 Minji Lee, Colin Kalicki, Minkyu Jeon, Aymen Qabel, Alisia Fadini, Mohammed AlQuraishi arxiv.org/abs/2604.18559 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚜.𝙻𝙶]

Models from the AlphaFold (AF) family reliably predict one dominant conformation for most well-ordered proteins but struggle to capture biologically relevant alternate states. Several efforts have focused on eliciting greater conformational variability through ad hoc inference-time perturbations of AF models or their inputs. Despite their progress, these approaches remain inefficient and fail to consistently recover major conformational modes. Here, we investigate both the optimal location and manner-of-operation for perturbing latent representations in the AF3 architecture. We distill our findings in ConforNets: channel-wise affine transforms of the pre-Pairformer pair latents. Unlike previous methods, ConforNets globally modulate AF3 representations, making them reusable across proteins. On unsupervised generation of alternate states, ConforNets achieve state-of-the-art success rates on all existing multi-state benchmarks. On the novel supervised task of conformational transfer, Conf

ALT Models from the AlphaFold (AF) family reliably predict one dominant conformation for most well-ordered proteins but struggle to capture biologically relevant alternate states. Several efforts have focused on eliciting greater conformational variability through ad hoc inference-time perturbations of AF models or their inputs. Despite their progress, these approaches remain inefficient and fail to consistently recover major conformational modes. Here, we investigate both the optimal location and manner-of-operation for perturbing latent representations in the AF3 architecture. We distill our findings in ConforNets: channel-wise affine transforms of the pre-Pairformer pair latents. Unlike previous methods, ConforNets globally modulate AF3 representations, making them reusable across proteins. On unsupervised generation of alternate states, ConforNets achieve state-of-the-art success rates on all existing multi-state benchmarks. On the novel supervised task of conformational transfer, Conf

130

Biomolecules Papers

Biomolecules Papers @Pastel

Apr 21

Boltzmann Machine Learning with a Parallel, Persistent Markov chain Monte Carlo method for Estimating Evolutionary Fields and Couplings from a Protein Multiple Sequence Alignment arxiv.org/abs/2604.18022 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚘𝚗𝚍-𝚖𝚊𝚝.𝚜𝚝𝚊𝚝-𝚖𝚎𝚌𝚑 𝚌𝚜.𝙻𝙶 𝚜𝚝𝚊𝚝.𝙼𝙻]

The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction

ALT The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction

Biomolecules Papers

Biomolecules Papers @Pastel

Apr 17

PUFFIN: Protein Unit Discovery with Functional Supervision Gökçe Uludoğan, Buse Giledereli, Elif Ozkirimli, Arzucan Özgür arxiv.org/abs/2604.14796 [𝚚-𝚋𝚒𝚘.𝙱𝙼 𝚌𝚜.𝙻𝙶] 💬to appear in ISMB 2026 proceedings

Proteins carry out biological functions through the coordinated action of groups of residues organized into structural arrangements. These arrangements, which we refer to as protein units, exist at an intermediate scale, being larger than individual residues yet smaller than entire proteins. A deeper understanding of protein function can be achieved by identifying these units and their associations with function. However, existing approaches either focus on residue-level signals, rely on curated annotations, or segment protein structures without incorporating functional information, thereby limiting interpretable analysis of structure-function relationships. We introduce PUFFIN, a data-driven framework for discovering protein units by jointly learning structural partitioning and functional supervision. PUFFIN represents proteins as residue-level structure graphs and applies a graph neural network with a structure-aware pooling mechanism that partitions each protein into multi-residue u

ALT Proteins carry out biological functions through the coordinated action of groups of residues organized into structural arrangements. These arrangements, which we refer to as protein units, exist at an intermediate scale, being larger than individual residues yet smaller than entire proteins. A deeper understanding of protein function can be achieved by identifying these units and their associations with function. However, existing approaches either focus on residue-level signals, rely on curated annotations, or segment protein structures without incorporating functional information, thereby limiting interpretable analysis of structure-function relationships. We introduce PUFFIN, a data-driven framework for discovering protein units by jointly learning structural partitioning and functional supervision. PUFFIN represents proteins as residue-level structure graphs and applies a graph neural network with a structure-aware pooling mechanism that partitions each protein into multi-residue u