Biology AI Daily

Biology AI Daily

Users
Tweets

Jun 10

Discriminator-guided Inverse Folding for Multi-property Protein Design 1 DGIF introduces a plug-and-play way to steer an inverse-folding model toward multiple protein properties at once, without fine-tuning the generative model and without needing multi-property-labeled datasets. 2 Core idea: during autoregressive sequence generation, DGIF backpropagates gradients from an auxiliary discriminator into the decoder’s internal history state (KV-cache / “history states”), then re-samples the next residue from the updated distribution—repeating this at every step. 3 The discriminator is a composition of multiple single-property predictors. Each predictor can be trained independently on a dataset labeled for only that property, and DGIF combines their signals with weights (beta_i) to perform multi-objective optimization. 4 DGIF is implemented on top of ESM-IF1, producing three variants: DG-Thermo (thermostability), DG-Sol (solubility), and DG-Dual (thermostability solubility). The base inverse-folding model parameters remain unchanged. 5 For thermostability guidance, the paper trains a ΔΔG predictor using ESM-IF1 representations on the Megascale dataset (≈700k mutation–stability pairs), with additional evaluation on FireProt and S669. The predictor outperforms several classic baselines (e.g., FoldX/Rosetta/Thermonet) and is competitive with ThermoMPNN. 6 DG-Thermo improves design outcomes vs unguided ESM-IF1 on: (i) average top-K recall for stabilizing mutations on Megascale test proteins, and (ii) “success rate” of full-sequence designs that both improve predicted stability (ΔΔG > 1.0 kcal/mol) and maintain foldability (predicted structure RMSD < 2 Å). 7 Mechanistic signals emerge naturally: DG-Thermo-designed proteins show more salt bridges and hydrophobic interactions, and amino-acid composition shifts consistent with thermophilic trends (e.g., increased L/P/R/W and decreased D/K/M/Q), despite these rules not being explicitly encoded as constraints. 8 MD validation: for xylanase at 450 K (100 ns), DG-Thermo variants maintain structure (lower RMSD, higher secondary-structure retention) compared with wild type and an unguided ESM-IF1 design; additional CATH-sampled scaffolds show similar stability gains in MD. 9 Solubility guidance: a binary solubility predictor (ESM-IF1 representations MLP) is trained on Khurana et al. and tested on Chang et al.; DG-Sol improves top-K recall on SoluProtMutDB and increases design success rates under joint criteria (better predicted solubility RMSD < 2 Å). On membrane proteins, DG-Sol designs increase surface polar residue proportion, consistent with higher solubility. 10 Multi-property optimization: DG-Dual jointly applies thermostability and solubility predictors and shifts designs toward the Pareto front (better stability/solubility trade-offs) on CATH redesign tasks. Wet-lab validation on Rhodococcus ruber alcohol dehydrogenase (RrADH) tests 10 DG-Dual-suggested single mutations: all improve solubility; 8/10 increase melting temperature. Examples include A50E (≈2x ELISA solubility signal 2.79 °C Tm) and S223A ( 6.47 °C Tm with concurrent solubility gain). 💻Code: github.com/aweqardf/ESM-IF1-… 📜Paper: doi.org/10.1002/advs.75988 #ProteinDesign #InverseFolding #MultiObjectiveOptimization #ESM #ComputationalBiology #MachineLearning #Thermostability #Solubility #ProteinEngineering

1,441

Release v9.0.0 · jenetics/jenetics