Binary Latent Protein Fitness Landscapes for Quantum Annealing Optimization
1 This work introduces Q‑BioLat, a framework that projects protein sequences into a compact binary latent space and models fitness as a quadratic unconstrained binary optimization (QUBO) problem, enabling efficient combinatorial search and direct compatibility with quantum annealers.
2 Protein language models (ESM‑2) supply contextual embeddings; these are reduced in dimensionality via random projection or PCA, then binarized using median‑thresholding to generate balanced, interpretable latent codes that still capture biologically relevant variation.
3 A ridge‑regularized QUBO surrogate is trained on the binary codes, incorporating both unary and pairwise interactions. The resulting linear‑plus‑quadratic objective is immediately usable by simulated annealing, genetic algorithms, or hardware Ising solvers without further modification.
4 Experiments on the ProteinGym GFP benchmark show that simulated annealing and genetic algorithms consistently climb the surrogate landscape, and the nearest‑neighbor real sequences retrieved from optimized codes lie in the top ~90 % of the training fitness distribution, indicating successful navigation of high‑fitness regions.
5 Representation geometry proves crucial: PCA‑based latent codes outperform random projections in optimization performance, despite identical Spearman correlation, highlighting that the structure of the latent space, not just predictive accuracy, governs search efficacy.
6 Retrieval‑based decoding maps optimized binary codes back to actual protein sequences by Hamming‑distance search among training variants, providing a conservative, interpretable bridge between latent solutions and real proteins without training a generative decoder.
7 The framework scales with O(m²) surrogate parameters and search complexity, keeping the heavy protein‑language‑model inference fixed while allowing rapid, repeated optimization in the small binary latent space.
8 Future directions include learning binary representations jointly with the surrogate, expanding beyond pairwise terms, and deploying the QUBO model on quantum annealing hardware to accelerate protein engineering at scale.
Code:
github.com/HySonLab/Q-BIOLAT
Paper:
arxiv.org/abs/2603.17247
#ProteinEngineering #QuantumComputing #MachineLearning #Bioinformatics #ProteinDesign #QUBO #CombinatorialOptimization