10/25 𝗩𝗶𝗿𝗮𝗹 𝗣𝗿𝗼𝘁𝗲𝗶𝗻𝘀 𝗥𝗲𝘃𝗲𝗮𝗹 𝗚𝗲𝗼𝗺𝗲𝘁𝗿𝘆 𝗼𝗳 𝗣𝗿𝗼𝘁𝗲𝗶𝗻 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀
This paper investigates how protein language models (pLMs) represent underrepresented viral proteins, identifying a dominant 'nativeness axis' in embedding space aligned with masked reconstruction perplexity. While this axis contracts unevenly across viral families, pLM embeddings retain viral-specific signal, demonstrating linear separability and suggesting representations balance a general nativeness concept with distinct biological group information.
#ProteinLanguageModels #pLM #ESMModels #ViralProteins #RepresentationalLearning #BiologicalSequences
Paper Link:
arxiv.org/abs/2606.12609