All-atomistic Transferable Neural Potentials for Protein Solvation
1 PHNN (Protein Hydration Neural Network) is introduced as an implicit solvent model that keeps the speed advantages of continuum solvation while learning environment-dependent corrections that remain predictive on out-of-domain proteins.
2 The key design shift vs typical “delta-learning” is that PHNN does not just add a post hoc energy correction; it learns corrections to the underlying GBn2 equation parameters (e.g., local dielectrics, screening, charges, SASA-related terms), so the analytical backbone remains the scaffold and the neural network fills known physical gaps.
3 PHNN is built on a GBn2 backbone and trained by force matching to explicit-solvent reference forces (CHARMM36 TIP3P), aiming to approximate mean solvation forces (PMF-consistent) using many instantaneous-force frames as noisy samples.
4 Architecture-wise, PHNN uses an E(3)-equivariant GNN (custom pseudo-MACE via cuEquivariance) to produce atom-centered embeddings that can represent higher-order geometric effects (up to quadrupolar information), which are relevant for anisotropic hydration structure and packing asymmetry near protein surfaces.
5 To avoid overfitting to stochastic instantaneous solvent forces, PHNN uses a heteroscedastic (variance-aware) training objective (β-NLL style). A separate invariant GNN estimates per-sample uncertainty using predicted forces plus key GBn2 parameters.
6 PHNN targets specific known GB/continuum failure modes with learned, physically interpretable modifications: (i) a learned modulation of the nonpolar SASA term with a learnable surface tension coefficient, (ii) atom-specific local solute dielectric and local solvent dielectric, (iii) a learned correction to the GB screening function to better handle mutual desolvation (important for salt bridges), and (iv) per-atom charge corrections to partially capture electrostriction-like effects.
7 On an independent OOD test (39 proteins), PHNN reports mean force MAE 66.6 ± 9.4 kJ/(mol·nm) vs explicit solvent, improving over GBn2 at 97.5 ± 9.0 kJ/(mol·nm) (about 31.7% lower error). The paper notes an intrinsic ceiling because explicit-solvent instantaneous forces have large variance, so deterministic implicit models cannot match every fluctuation.
8 In dynamical tests (4 domains, up to ~5400 atoms), PHNN better preserves native-like behavior than GBn2 when comparing RMSD/ROG/RMSF distributions and KDE-derived free-energy landscapes; GBn2 shows stronger unfolding tendencies, especially for larger domains.
9 Targeted error breakdowns suggest PHNN improves across secondary-structure classes and residue types; the largest gains are reported for lysine (consistent with improved salt-bridge screening). Remaining challenges include arginine (delocalized guanidinium charge is difficult to fix with per-atom corrections) and buried regions where long-range electrostatics may require larger interaction radii or deeper models.
10 Transferability limits are probed with alanine dipeptide (near-zero sequence similarity to training domains): PHNN reproduces major Ramachandran basins but distorts basin shapes and mis-ranks some regions (notably αR), motivating future training that explicitly enriches boundary/strained conformations via umbrella sampling and broader conformational coverage (including IDPs and 300K data).
📜Paper:
arxiv.org/abs/2605.14584
#ComputationalBiophysics #MolecularDynamics #ImplicitSolvent #ForceFields #NeuralPotentials #EquivariantGNN #ProteinSimulation #Solvation #MLforScience