PRISM: A High-Throughput Simulation Infrastructure for CADD Agents
1. PRISM (Protein-Receptor Interaction Simulation Modeler) is presented as a GROMACS-native, Python-based infrastructure that turns fragmented protein–ligand MD workflows into a single, reproducible, high-throughput pipeline—designed to serve as a reliable backend for agent-driven CADD.
2. A key integration point is unified ligand parameterization behind one interface: GAFF/GAFF2 (AmberTools ACPYPE), OpenFF (SMIRNOFF via Interchange), CGenFF (stream parsing), OPLS-AA (LigParGen), and SwissParam options (MMFF/MATCH/hybrid). Outputs are standardized (GRO/ITP/atom types/restraints) so downstream steps are force-field-path agnostic.
3. For higher-quality electrostatics, PRISM adds an optional Gaussian RESP workflow (HF/6-31G* or B3LYP/6-31G*), allowing users to replace AM1-BCC charges while keeping the rest of the automated system build unchanged.
4. System construction is automated end-to-end: PDBFixer repairs structures (missing atoms/side chains/altloc issues), optional PROPKA assigns protonation states, pdb2gmx builds protein topologies, then PRISM merges ligand/protein topologies, solvates with configurable box shapes (default 1.5 nm padding), and sets ions (default 0.15 M NaCl).
5. Simulation control emphasizes reproducibility via a YAML configuration precedence system (CLI > user config > defaults) that generates/edits GROMACS .mdp files with validated defaults (PME, LINCS, v-rescale thermostat, stochastic cell rescaling barostat) and supports parallel multi-ligand directory organization (default production length reported as 500 ns).
6. Enhanced sampling is built in through automated REST2 setup: geometric temperature ladders (default 310–450 K), per-replica scaling rules (charges scaled by sqrt(λ), LJ ε by λ, etc.), per-replica topologies, and a single orchestration script—reducing the usual manual burden of replica workflows.
7. PRISM supports multi-tier binding energetics: endpoint MM/PB(GB)SA automation (gmx_MMPBSA or AMBER
MMPBSA.py with topology conversion), in single-frame mode for fast triage or trajectory mode for averaging, with component decomposition (vdW, electrostatics, polar/nonpolar solvation).
8. The PMF module contributes a notable algorithmic piece: automated pulling direction optimization for umbrella sampling, using Metropolis–Hastings sampling on the unit sphere with simulated annealing to minimize steric hindrance (pocket-clearance mode or whole-protein collision mode), then auto-rotating the complex, elongating the box, generating SMD, extracting windows, and running WHAM.
9. PRISM-FEbuilder targets a common FEP pain point—hybrid topology construction—using distance-based atom mapping (default 0.6 Å) to classify atoms (common/transformed/surrounding) and manage charge differences with configurable strategies (reference-preserving, mutant-preserving, averaging). It emits GROMACS single-topology files with typeB/chargeB and dummy atoms, plus λ-window soft-core setup.
10. Results highlight two demonstrations: (i) an agent-orchestrated hierarchical screen on riboflavin synthase (ChEMBLFind → MolScope diversity selection → Vina docking → PRISM MM/PBSA) that not only recovers an active-site-like binder but also flags CHEMBL186010 binding at a trimerization-relevant C-terminal helix pocket, suggesting a potential allosteric/oligomerization-disruption site; (ii) FEbuilder benchmarking on HIF-2α, T4 lysozyme L99A, and p38α kinase with RMSE ~0.72–0.90 kcal/mol and generally small cycle-closure hysteresis.
💻Code:
github.com/AIB001/PRISM
📜Paper:
biorxiv.org/content/10.64898…
#MolecularDynamics #GROMACS #CADD #BindingFreeEnergy #FEP #UmbrellaSampling #REST2 #ForceFields #AIAgents #ComputationalChemistry #DrugDiscovery