Filter
Exclude
Time range
-
Near
Learning the Native-Like Codons with a 5'UTR and RNA Secondary Structure Aided Species-Informed Transformer Model 1. A new deep learning model named TransCodon has been introduced to address the challenge of efficient protein expression across different species by optimizing codon usage. The model leverages a Transformer architecture to learn nuanced codon usage patterns across diverse organisms, integrating 5'UTR and RNA secondary structure information for more accurate codon optimization. 2. TransCodon incorporates species-specific information and RNA secondary structure features, enabling it to capture both local and global determinants of codon preference. This approach significantly improves the model's ability to predict optimal synonymous codons, resulting in sequences that closely resemble natural gene sequences. 3. The model was trained on a large-scale dataset comprising 5.5 million gene sequences from 1436 species, ensuring robust cross-species generalization. Experimental results demonstrate that TransCodon outperforms traditional methods and recent machine learning-based approaches in multiple evaluation metrics, including codon recovery rate, Codon Similarity Index (CSI), and GC content. 4. TransCodon effectively captures the usage of low-frequency codons, which are often omitted by other methods. This feature is crucial for generating synthetic gene sequences that closely approach their natural counterparts, potentially enhancing protein expression and folding efficiency. 5. The model also shows a strong correlation between its fitness scores and experimentally measured protein expression levels, indicating its potential for predicting protein abundance. Additionally, TransCodon demonstrates superior performance in downstream tasks such as predicting mean ribosome load (MRL) based on 5'UTR sequences. 6. The study highlights the importance of considering regulatory regions like the 5'UTR and RNA secondary structure in codon optimization. TransCodon's ability to integrate these elements sets it apart from previous models and underscores its potential for applications in synthetic biology and gene expression studies. 📜Paper: biorxiv.org/content/10.1101/… 💻Code: github.com/guyuehuo/TransCod… #TransCodon #CodonOptimization #DeepLearning #SyntheticBiology #GeneExpression #RNASecondaryStructure #5UTR #ProteinEngineering
2
2
7
1,104
Species‑aware DNA LMs just became practical. TransCodon learns 5′ UTR   CDS   RNA structure to rewrite genes with 49% native‑codon recall🆗 ...keeping rare pauses that guide folding and without CAI inflation🧬
1
8
25
2,351
Learning The Native-Like Codons With A 5'UTR And Secondary RNA Structure Aided Species-Informed Transformer Model A new Transformer-based deep learning model, TransCodon, has been developed to address the challenge of efficient protein expression in heterologous hosts. It tackles the difficulty of reconstructing native-like codon landscapes by integrating 5' untranslated regions (5'UTRs), coding sequences (CDS), explicit species identifiers, and RNA secondary structure information. TransCodon learns nuanced codon usage patterns across diverse organisms by incorporating multisource genomic data and modeling sequence dependencies via a masked language modeling paradigm. This allows it to effectively capture both local and global determinants of codon preference. A key innovation is TransCodon's use of a finer-grained vocabulary based solely on nucleotides, which enables partial decoding and preserves richer sequence-level information compared to previous approaches. The model was trained on a large dataset of 5.5 million gene sequences from 1,436 species, ensuring robust cross-species generalization. Experimental results demonstrate that TransCodon consistently outperforms existing codon optimization tools across multiple evaluation metrics. It identifies native-like codons with less divergence from natural sequences and can capture low-frequency codons often missed by other deep learning methods, especially for highly abundant proteins. Beyond codon optimization, TransCodon shows robust effectiveness in predicting protein abundance, achieving high correlation with experimentally determined values in zero-shot scenarios. It also excels in 5' UTR-related downstream tasks, such as predicting Mean Ribosome Load (MRL), surpassing other state-of-the-art models. These findings indicate that TransCodon is a robust codon language model with significant potential for designing genes to achieve high translational efficiency in target host organisms, marking a notable advancement in computational synthetic biology. 📜Paper: biorxiv.org/content/10.1101/… #ComputationalBiology #SyntheticBiology #DeepLearning #ProteinExpression #CodonOptimization #Bioinformatics #Genomics #TransformerModels
1
2
923
Learning The Native-Like Codons With A 5'UTR And Secondary RNA Structure Aided Species-Informed Transformer Model A new Transformer-based deep learning model, TransCodon, has been developed to address the challenge of efficient protein expression in heterologous hosts. It tackles the difficulty of reconstructing native-like codon landscapes by integrating 5' untranslated regions (5'UTRs), coding sequences (CDS), explicit species identifiers, and RNA secondary structure information. TransCodon learns nuanced codon usage patterns across diverse organisms by incorporating multisource genomic data and modeling sequence dependencies via a masked language modeling paradigm. This allows it to effectively capture both local and global determinants of codon preference. A key innovation is TransCodon's use of a finer-grained vocabulary based solely on nucleotides, which enables partial decoding and preserves richer sequence-level information compared to previous approaches. The model was trained on a large dataset of 5.5 million gene sequences from 1,436 species, ensuring robust cross-species generalization. Experimental results demonstrate that TransCodon consistently outperforms existing codon optimization tools across multiple evaluation metrics. It identifies native-like codons with less divergence from natural sequences and can capture low-frequency codons often missed by other deep learning methods, especially for highly abundant proteins. Beyond codon optimization, TransCodon shows robust effectiveness in predicting protein abundance, achieving high correlation with experimentally determined values in zero-shot scenarios. It also excels in 5' UTR-related downstream tasks, such as predicting Mean Ribosome Load (MRL), surpassing other state-of-the-art models. These findings indicate that TransCodon is a robust codon language model with significant potential for designing genes to achieve high translational efficiency in target host organisms, marking a notable advancement in computational synthetic biology. 📜Paper: biorxiv.org/content/10.1101/… #ComputationalBiology #SyntheticBiology #DeepLearning #ProteinExpression #CodonOptimization #Bioinformatics #Genomics #TransformerModels
7
727