Biology AI Daily

Biology AI Daily

Users
Tweets

Apr 4

Decoding Synonymous Codon Selection With a Transformer Model 1 CaNAT is a non-autoregressive Transformer that predicts an entire codon sequence directly from an amino-acid sequence, with a design choice that prioritizes rare-codon learning via batch-wise weighted cross-entropy (rare codons contribute more to gradients instead of being drowned out by frequency bias). 2 Trained at scale on >3 million coding sequences from 620 species (taxonomically broad, cluster-split to avoid homology leakage), CaNAT aims to recover native synonymous choices rather than only “optimize” codons for expression, and it does so without providing organism labels during training. 3 A key practical output is a codon-level confidence score (softmax probability per position). The study shows confidence correlates strongly with accuracy, and introduces a degeneracy-aware threshold T(k, α) to compare confidence fairly across amino acids with different numbers of synonymous codons. 4 On the full multi-species test set, CaNAT reaches ~53% exact codon accuracy, beating statistical baselines such as “always pick the most frequent codon” (~48%), frequency sampling (~39%), and uniform random (~33%). High-confidence filtering further boosts accuracy. 5 Compared to CodonTransformer (an organism-fine-tuned model), CaNAT is especially strong on rare codons (RSCU < 0.7): across shared benchmark species, CaNAT generally improves rare-codon prediction (notably in human and mouse), suggesting the balanced-loss strategy helps capture context-dependent rare-codon placement. 6 Even without species labels, CaNAT’s predictions recapitulate organism-specific synonymous codon distributions: for E. coli, H. sapiens, and an extremophile (Streptococcus thermophilus), predicted vs observed codon-usage profiles show near-perfect rank agreement across amino acids (median Spearman close to 1), including preservation of rare codons. 7 Internal representations encode species identity at positional resolution: embeddings from the decoder separate species via LDA, implying that local sequence context contains enough signal for the model to infer organism-specific codon “signatures” throughout a gene. 8 The model captures biophysical constraints beyond GC-content: adding RNA stability (ViennaRNA) to a regression explaining prediction accuracy increases explained variance from R²=0.148 (GC only) to R²=0.191 (GC stability), indicating that codon-choice predictability relates to structured RNA features. 9 Attention analysis reveals multi-scale context for codon choice: (i) tight near-diagonal heads consistent with dicodon/codon-pair effects, (ii) broader local windows spanning several codons, and (iii) long-range diagonals with offsets roughly −70 to 60, suggesting distant regions contribute to local synonymous decisions; attention is often biased slightly downstream. 10 CaNAT predictions connect to functional constraint: on three deep synonymous-scanning datasets in E. coli (ddlA folding efficiency, RNase III function, TEM-1 β-lactamase fitness), accuracy is highest at “wild-type only tolerated” positions, which are enriched for rare codons; at partially tolerant sites, CaNAT often predicts an alternative tolerated codon rather than exactly the wild type, consistent with learning constraint strength rather than memorizing sequences. 💻Code: github.com/Andre-lab/CaNAT/ 📜Paper: biorxiv.org/content/10.64898… #ComputationalBiology #CodonUsage #SynonymousMutations #Transformers #DeepLearning #Translation #RNAstructure #ProteinFolding #Bioinformatics #Genomics

2,542

CSBJ

CSBJ @CSB_Journal

30 Jun 2025

🔗 A codon usage-based approach for the stratification of Influenza A across recent spillovers. @CSB_Journal, DOI: doi.org/10.1016/j.csbj.2025.… 📚 CSBJ: csbj.org/ #Influenza #Genomics #PublicHealth #Bioinformatics #Zoonosis #CodonUsage #Epidemiology #Virology #Virus

A codon usage-based approach for the stratification of Influenza A across recent spillovers. Computational and Structural Biotechnology Journal, DOI: https://doi.org/10.1016/j.csbj.2025.06.030

ALT A codon usage-based approach for the stratification of Influenza A across recent spillovers. Computational and Structural Biotechnology Journal, DOI: https://doi.org/10.1016/j.csbj.2025.06.030

103

Pankaj Bhardwaj

Pankaj Bhardwaj @Pankajihbt

20 Feb 2025

Unveiling codon usage patterns across diverse plant lineages! Explore how evolutionary forces shape genetic coding in plants. Check out our latest study: [rdcu.be/eaDhH] #PlantGenomics #CodonUsage #EvolutionaryBiology #PlantScience #MolecularGenetics

Biology AI Daily

Biology AI Daily @BiologyAIDaily

9 Jan 2025

Predicting gene sequences with AI to study codon usage patterns @PNASNews 1. This study introduces an AI-driven mBART model to predict codon sequences, revealing complex patterns in codon usage that go beyond frequency-based approaches, suggesting evolutionary selection impacts codon interactions. 2. The mBART model demonstrates superior performance in predicting codons for highly expressed genes and longer proteins, with significant improvements in bacterial species, highlighting the influence of population size on codon selection. 3. Two tasks were tackled: masking, predicting codons from amino acid sequences, and mimicking, using codons of orthologous proteins. The model outperformed frequency-based baselines, especially in highly expressed and conserved genes. 4. The findings emphasize the importance of codon usage in translation kinetics, cotranslational folding, and gene expression regulation, shedding light on how evolutionary pressures shape coding sequences. 5. The model was tested on four organisms (S. cerevisiae, S. pombe, E. coli, and B. subtilis) with stringent evaluation metrics, ensuring the robustness of predictions in diverse biological contexts. 6. By identifying codon patterns linked to functionality and conservation, this research provides a tool to optimize codon sequences for heterologous protein expression in biotechnology and synthetic biology. 7. The study highlights the role of codon usage in overlapping regulatory codes, advancing our understanding of genetic encoding beyond traditional measures of codon bias. 8. mBART’s utility extends to annotating novel genes and understanding codon selection in different evolutionary contexts, bridging computational biology and genomics. @rachelkolodny 💻Code: github.com/siditom-cs/ReverT… 📜Paper: pnas.org/doi/10.1073/pnas.24… #ComputationalBiology #CodonUsage #Genomics #AI #ProteinTranslation

3,059

FoxLabNC

FoxLabNC @FoxLabNC

20 Jun 2024

Check out our latest #CodonUsage paper @NatureComms ! First author: @rkstewart89 , with collaborators @alaederach and @pelincvolkan ! We identify the CPEB RNA binding protein Orb2 as key to rare codon expression during neural stem cell differentiation. rdcu.be/dLowS

2,506

Christine Mordstein

Christine Mordstein @CMordstein

27 May 2024

While the #RNA world is streaming towards Edinburgh, I'm on the plane back home after a fantastic 2nd edition of the #CodonUsage workshop! I'm incredibly proud to still be part of such a scientifically diverse community & cannot wait to see all the amazing science emerging soon!

295

Christine Mordstein

Christine Mordstein @CMordstein

25 May 2024

Yi Liu kicking off the 2nd edition of the #CodonUsage workshop in beautiful #Edinburgh! It's all about building a community for scientists from incredibly diverse backgrounds. Looking forward to a fantastic lineup this year!

832

Takehiro A. Ozawa-Uyeda

Takehiro A. Ozawa-Uyeda @tk_ozawa

27 Feb 2024

Great work by Hammel et al. (2024) on demonstrating the impact of #CodonUsage on the efficiency of nuclear transgene expression (e.g. YFP 🔆) in the red alga #Porphyridium purpureum and its efficient recombinant protein secretion into the culture medium. 🔗link.springer.com/article/10…

Figure 2. Protein expression from the different YFP variants and comparison to the green microalga Chlamydomonas reinhardtii.

a) Immunoblot analysis to compare YFP accumulation in transgenic algal strains.

b) Semi-quantitative immunoblot analysis to determine YFP expression levels from fully codon-optimized gene variants in P. purpureum (PpYFP) and C. reinhardtii (CrYFP).
YFP accumulates in Porphyridium to approximately 5% of total soluble protein (TSP) from the gene fully codon optimized for P. purpureum (PpYFP) lines, and to approximately 1.2% from the gene codon optimized for C. reinhardtii (CrYFP) lines. Chlamydomonas expression strain UVM11 (Neupert et al. 2009, 2020) accumulates the CrYFP to approximately 0.8% of TSP (Barahimipour et al. 2015).

c) Visualization of PpYFP accumulation in P. purpureum by Coomassie staining of 10 µg electrophoretically separated total soluble protein.

ALT Figure 2. Protein expression from the different YFP variants and comparison to the green microalga Chlamydomonas reinhardtii. a) Immunoblot analysis to compare YFP accumulation in transgenic algal strains. b) Semi-quantitative immunoblot analysis to determine YFP expression levels from fully codon-optimized gene variants in P. purpureum (PpYFP) and C. reinhardtii (CrYFP). YFP accumulates in Porphyridium to approximately 5% of total soluble protein (TSP) from the gene fully codon optimized for P. purpureum (PpYFP) lines, and to approximately 1.2% from the gene codon optimized for C. reinhardtii (CrYFP) lines. Chlamydomonas expression strain UVM11 (Neupert et al. 2009, 2020) accumulates the CrYFP to approximately 0.8% of TSP (Barahimipour et al. 2015). c) Visualization of PpYFP accumulation in P. purpureum by Coomassie staining of 10 µg electrophoretically separated total soluble protein.

273

Christine Mordstein

Christine Mordstein @CMordstein

16 Feb 2024

I've not seen this one advertised on twitter yet, so here it comes: 🚨 The 2nd workshop on #CodonUsage is happening again in beautiful Edinburgh and the speaker line up is absolutely fantastic! 🚨 Check it out 👇 sites.google.com/view/codonc…

Codon Usage

About the Conference Codon usage bias, the preference for certain synonymous codons, is a key factor in genome regulation. Patterns of codon usage are among the strongest known predictors of protein...

sites.google.com

2,556

Hannah Benisty

Hannah Benisty @Hannah_Benisty

8 Mar 2023

Happy to share our work exploring the role of A/T-ending codons across species, tissues, development, gene families and protein complexes🧬🦧🧠 👉bit.ly/cellsystems @CRGenomica @lab_serrano with @mirimiam @weghornlab @MartinHSchaefer #codonusage #rasgenes @CellSystemsCP

6,439

Francisco González-Serrano (PacoMax)

Francisco González-Serrano (PacoMax)@GenoPacoMax

26 Sep 2022

Glad to share our new published paper #codon #codonusage #codonusagebias #prokaryoticgenomics #molecularevolution link.springer.com/article/10…

Javier Santoyo

Javier Santoyo @jsantoyo

23 Jul 2022

Codon Statistics Database: a Database of Codon Usage Bias. #CodonUsage #CodonStats @MolBioEvol academic.oup.com/mbe/advance…

The Codon Statistics Database: A Database of Codon Usage Bias

Abstract. We present the Codon Statistics Database, an online database that contains codon usage statistics for all the species with reference or represent

academic.oup.com

The Lab LaBella

The Lab LaBella @Lab_LaBella

16 Jun 2022

Which Yeasts are Xylose Beasts? New work from myself @RokasLab @HittingerLab @KtlnFisher @DOpulente & UG trainees shows how #codonusage is better than gene content at predicting xylose catabolism - which is critical in biofuels development #3rdBaseThurs biorxiv.org/content/10.1101/…

Mohd Ahmad

Mohd Ahmad @m_ahmad37

8 Apr 2022

Welcome to the world of codons. Excited to attend my first PhD conference on codon usage. #EMBO_CU22 #codonusage

Alex Georgakilas

Alex Georgakilas @GeorgakilasAlex

11 Jan 2022

Please check our recent work on the Temporal evolution and adaptation of SARS-CoV-2 codon usage imrpress.net/journal/FBL/27/… @GeorgakilasAlex #georgakilaslab #codonusage #SARSCoV2 #Bioinformatics

H1 Connect

H1 Connect @H1Connect

27 Dec 2021

Mutated oncogenes & the importance of codons: We revisit this recommendation by Chava Kimchi-Sarfaty, Douglas Meyer, and Upendra Katneni of @MartinHSchaefer @_BIST in @PNASNews #oncogenes #cancergenetics #codonusage facultyopinions.com/prime/73…

The FEBS Journal

The FEBS Journal

@FEBSJournal

17 Sep 2021

Non-optimal codon usage preferences of coronaviruses determine their promiscuity for infecting multiple hosts 🔗buff.ly/3k6ehf1 ✏️By Milana Frenkel-Morgenstern and colleagues @ubarilan @AzrieliOf #COVID19 #SARSCoV2 #codonusage

Rhondene Wint 🧬

Rhondene Wint 🧬@R_Winty

29 May 2021

#CodonUsage peeps we love CodonW but parsing the output codon table is onerous. I wrote a python program that computes the relative synonymous codon Usage (RSCU) of each CDS and outputs a clean analysis-ready 61 x n (cds) table. github.com/rhondene/Codon-Us…

Botanical Journal of the Linnean Society

Botanical Journal of the Linnean Society @BotJLinnSoc

28 Feb 2021

Codon usage pattern in #Gnetales evolved in close accordance with the #GnetiferHypothesis. Majeed et al. #Conifers #Gymnosperms #Angiosperms #CodonUsage doi.org/10.1093/botlinnean/b…

Christine Mordstein

Christine Mordstein @CMordstein

18 Dec 2020

It took a bit of convincing, but the Kudla lab @mrc_hgu is now on twitter‼️ #RNA #syntheticbiology #codonusage 👇Tweets by the mastermind himself👇

Kudla lab @kudlalab

18 Dec 2020

Which mutations change the colour, fluorescence intensity and metal ion binding of an RNA-ligand complex? Olga Puchta collected 7 million genotype-phenotype associations to find out. Well done Olga, Greg, Graeme, @jmbujnicki @doyarzunrod @dynafluors! biorxiv.org/content/10.1101/…