Filter
Exclude
Time range
-
Near
Decoding Synonymous Codon Selection With a Transformer Model 1 CaNAT is a non-autoregressive Transformer that predicts an entire codon sequence directly from an amino-acid sequence, with a design choice that prioritizes rare-codon learning via batch-wise weighted cross-entropy (rare codons contribute more to gradients instead of being drowned out by frequency bias). 2 Trained at scale on >3 million coding sequences from 620 species (taxonomically broad, cluster-split to avoid homology leakage), CaNAT aims to recover native synonymous choices rather than only “optimize” codons for expression, and it does so without providing organism labels during training. 3 A key practical output is a codon-level confidence score (softmax probability per position). The study shows confidence correlates strongly with accuracy, and introduces a degeneracy-aware threshold T(k, α) to compare confidence fairly across amino acids with different numbers of synonymous codons. 4 On the full multi-species test set, CaNAT reaches ~53% exact codon accuracy, beating statistical baselines such as “always pick the most frequent codon” (~48%), frequency sampling (~39%), and uniform random (~33%). High-confidence filtering further boosts accuracy. 5 Compared to CodonTransformer (an organism-fine-tuned model), CaNAT is especially strong on rare codons (RSCU < 0.7): across shared benchmark species, CaNAT generally improves rare-codon prediction (notably in human and mouse), suggesting the balanced-loss strategy helps capture context-dependent rare-codon placement. 6 Even without species labels, CaNAT’s predictions recapitulate organism-specific synonymous codon distributions: for E. coli, H. sapiens, and an extremophile (Streptococcus thermophilus), predicted vs observed codon-usage profiles show near-perfect rank agreement across amino acids (median Spearman close to 1), including preservation of rare codons. 7 Internal representations encode species identity at positional resolution: embeddings from the decoder separate species via LDA, implying that local sequence context contains enough signal for the model to infer organism-specific codon “signatures” throughout a gene. 8 The model captures biophysical constraints beyond GC-content: adding RNA stability (ViennaRNA) to a regression explaining prediction accuracy increases explained variance from R²=0.148 (GC only) to R²=0.191 (GC stability), indicating that codon-choice predictability relates to structured RNA features. 9 Attention analysis reveals multi-scale context for codon choice: (i) tight near-diagonal heads consistent with dicodon/codon-pair effects, (ii) broader local windows spanning several codons, and (iii) long-range diagonals with offsets roughly −70 to 60, suggesting distant regions contribute to local synonymous decisions; attention is often biased slightly downstream. 10 CaNAT predictions connect to functional constraint: on three deep synonymous-scanning datasets in E. coli (ddlA folding efficiency, RNase III function, TEM-1 β-lactamase fitness), accuracy is highest at “wild-type only tolerated” positions, which are enriched for rare codons; at partially tolerant sites, CaNAT often predicts an alternative tolerated codon rather than exactly the wild type, consistent with learning constraint strength rather than memorizing sequences. 💻Code: github.com/Andre-lab/CaNAT/ 📜Paper: biorxiv.org/content/10.64898… #ComputationalBiology #CodonUsage #SynonymousMutations #Transformers #DeepLearning #Translation #RNAstructure #ProteinFolding #Bioinformatics #Genomics
8
36
2,542
30 Jun 2025
🔗 A codon usage-based approach for the stratification of Influenza A across recent spillovers. @CSB_Journal, DOI: doi.org/10.1016/j.csbj.2025.… 📚 CSBJ: csbj.org/ #Influenza #Genomics #PublicHealth #Bioinformatics #Zoonosis #CodonUsage #Epidemiology #Virology #Virus
1
2
103
Unveiling codon usage patterns across diverse plant lineages! Explore how evolutionary forces shape genetic coding in plants. Check out our latest study: [rdcu.be/eaDhH] #PlantGenomics #CodonUsage #EvolutionaryBiology #PlantScience #MolecularGenetics

3
76
Predicting gene sequences with AI to study codon usage patterns @PNASNews 1. This study introduces an AI-driven mBART model to predict codon sequences, revealing complex patterns in codon usage that go beyond frequency-based approaches, suggesting evolutionary selection impacts codon interactions. 2. The mBART model demonstrates superior performance in predicting codons for highly expressed genes and longer proteins, with significant improvements in bacterial species, highlighting the influence of population size on codon selection. 3. Two tasks were tackled: masking, predicting codons from amino acid sequences, and mimicking, using codons of orthologous proteins. The model outperformed frequency-based baselines, especially in highly expressed and conserved genes. 4. The findings emphasize the importance of codon usage in translation kinetics, cotranslational folding, and gene expression regulation, shedding light on how evolutionary pressures shape coding sequences. 5. The model was tested on four organisms (S. cerevisiae, S. pombe, E. coli, and B. subtilis) with stringent evaluation metrics, ensuring the robustness of predictions in diverse biological contexts. 6. By identifying codon patterns linked to functionality and conservation, this research provides a tool to optimize codon sequences for heterologous protein expression in biotechnology and synthetic biology. 7. The study highlights the role of codon usage in overlapping regulatory codes, advancing our understanding of genetic encoding beyond traditional measures of codon bias. 8. mBART’s utility extends to annotating novel genes and understanding codon selection in different evolutionary contexts, bridging computational biology and genomics. @rachelkolodny 💻Code: github.com/siditom-cs/ReverT… 📜Paper: pnas.org/doi/10.1073/pnas.24… #ComputationalBiology #CodonUsage #Genomics #AI #ProteinTranslation
8
46
3,059
20 Jun 2024
Check out our latest #CodonUsage paper @NatureComms ! First author: @rkstewart89 , with collaborators @alaederach and @pelincvolkan ! We identify the CPEB RNA binding protein Orb2 as key to rare codon expression during neural stem cell differentiation. rdcu.be/dLowS

3
5
18
2,506
While the #RNA world is streaming towards Edinburgh, I'm on the plane back home after a fantastic 2nd edition of the #CodonUsage workshop! I'm incredibly proud to still be part of such a scientifically diverse community & cannot wait to see all the amazing science emerging soon!
6
295
Yi Liu kicking off the 2nd edition of the #CodonUsage workshop in beautiful #Edinburgh! It's all about building a community for scientists from incredibly diverse backgrounds. Looking forward to a fantastic lineup this year!
8
832
Great work by Hammel et al. (2024) on demonstrating the impact of #CodonUsage on the efficiency of nuclear transgene expression (e.g. YFP 🔆) in the red alga #Porphyridium purpureum and its efficient recombinant protein secretion into the culture medium. 🔗link.springer.com/article/10…
1
1
273
I've not seen this one advertised on twitter yet, so here it comes: 🚨 The 2nd workshop on #CodonUsage is happening again in beautiful Edinburgh and the speaker line up is absolutely fantastic! 🚨 Check it out 👇 sites.google.com/view/codonc…
1
10
21
2,556
Happy to share our work exploring the role of A/T-ending codons across species, tissues, development, gene families and protein complexes🧬🦧🧠 👉bit.ly/cellsystems @CRGenomica @lab_serrano with @mirimiam @weghornlab @MartinHSchaefer #codonusage #rasgenes @CellSystemsCP

1
12
36
6,439
Which Yeasts are Xylose Beasts? New work from myself @RokasLab @HittingerLab @KtlnFisher @DOpulente & UG trainees shows how #codonusage is better than gene content at predicting xylose catabolism - which is critical in biofuels development #3rdBaseThurs biorxiv.org/content/10.1101/…
8
21
Welcome to the world of codons. Excited to attend my first PhD conference on codon usage. #EMBO_CU22 #codonusage
1
28
Please check our recent work on the Temporal evolution and adaptation of SARS-CoV-2 codon usage imrpress.net/journal/FBL/27/… @GeorgakilasAlex #georgakilaslab #codonusage #SARSCoV2 #Bioinformatics

1
3
27 Dec 2021
Mutated oncogenes & the importance of codons: We revisit this recommendation by Chava Kimchi-Sarfaty, Douglas Meyer, and Upendra Katneni of @MartinHSchaefer @_BIST in @PNASNews #oncogenes #cancergenetics #codonusage facultyopinions.com/prime/73…
1
2
Non-optimal codon usage preferences of coronaviruses determine their promiscuity for infecting multiple hosts 🔗buff.ly/3k6ehf1 ✏️By Milana Frenkel-Morgenstern and colleagues @ubarilan @AzrieliOf #COVID19 #SARSCoV2 #codonusage
2
#CodonUsage peeps we love CodonW but parsing the output codon table is onerous. I wrote a python program that computes the relative synonymous codon Usage (RSCU) of each CDS and outputs a clean analysis-ready 61 x n (cds) table. github.com/rhondene/Codon-Us…
3
3
It took a bit of convincing, but the Kudla lab @mrc_hgu is now on twitter‼️ #RNA #syntheticbiology #codonusage 👇Tweets by the mastermind himself👇
18 Dec 2020
Which mutations change the colour, fluorescence intensity and metal ion binding of an RNA-ligand complex? Olga Puchta collected 7 million genotype-phenotype associations to find out. Well done Olga, Greg, Graeme, @jmbujnicki @doyarzunrod @dynafluors! biorxiv.org/content/10.1101/…
1
14