🚀 Introducing Nucleotide Transformer v3 (NTv3)
Today, we are very excited to share our latest foundation model for biology - Nucleotide Transformer v3 (NTv3).
NTv3 is @instadeepai new multi-species genomics foundation model, designed for 1 Mb, single-nucleotide-resolution prediction, and for bridging representation learning, sequence-to-function modeling, and generative regulatory design within a single framework 🧬
This work was developed in close collaboration with @AlexanderStark8 , @volokuleshov , and @pkoo562 , and reflects several years of joint effort at the intersection of machine learning and regulatory genomics.
Nucleotide Transformer is a series of genomics foundation models of different parameter sizes and training datasets which can be applied to various downstream tasks by fine-tuning. @instadeepainature.com/articles/s41592-0…
We were invited to write a review article on DNA/genomic language models (gLMs). We took this occasion to gather our thoughts on promising applications, and major considerations for developing and evaluating gLMs. Pls share with your colleagues: Preprint: arxiv.org/abs/2407.11435
🌱 As scientists wrestled with the mysteries of plant genomics, our #AI problem solvers had a daring question, "Can AI make a difference?" @Nature covered how our #research and AI #genomics teams teamed up to search for an answer, with AgroNT – our new #LLM. 📚Read more→ go.nature.com/3xWopR7
🤗 Access our LLM #open-source on : huggingface.co/InstaDeepAI/a…
Very proud to announce ChatNT, the first multimodal conversational agent for biological sequences🧬. ChatNT can be simply prompted with a question and a nucleotide sequence to solve DNA, RNA and Protein tasks 🧑🔬!
📚Paper: tinyurl.com/chatNT-pdf
🌐Blog: tinyurl.com/chatNT-blog
The NT Family is growing! 🐣
✨ Introducing ChatNT, a Conversational Agent designed to analyse genomics sequences and address a wide range of key biological questions, assisting scientists in their daily work 👩🔬
📚Paper: tinyurl.com/chatNT-pdf
🌐Blog: tinyurl.com/chatNT-blog
Glad that our sequence model of promoters in human genome is now published in @ScienceMagazine. Check out the paper for a deep dive into the sequence basis of transcription initiation at the basepair level: science.org/doi/10.1126/scie…
✨ Meet SegmentNT: the first-ever LLM capable of annotating DNA sequences at single nucleotide resolution.
Built on top of our Nucleotide Transformer, SegmentNT offers precise genome annotation surpassing traditional methods and can offer deeper insights into our genome 🧬
InstaDeep's @JavierMenRev will share how AI systems - and LLMs specifically - deepen our understanding of the Maize genome tomorrow at 7:20 pm at the Maize Genetics Meeting in Raleigh, NC. #MGM2024
Read the paper: biorxiv.org/content/10.1101/…
The analyses highlighted by the authors risk exacerbating the confusion. Genetic ancestry does not map neatly onto, nor is it highly concordant with these geographic and ethnic categories. 5/n
ALT We evaluated the performance of the ancestry predictions against the self-reported ethnicity of the All of Us samples as ground truth. The performance should be worse than the holdout HGDP samples, but this is expected. Self-reported ethnicity does not correspond to the populations listed above and is prone to false reporting.
“Correct” labeling between HGDP/1kg populations and All of Us ethnicities:
1. African (AFR) → Black
2. Latino/Ad Mixed American (AMR) → Hispanic
3. East Asian (EAS) → Asian
4. Finnish (FIN) → White
5. Middle Eastern (MID) → MENA
6. Non-Finnish European (NFE) → White
7. Other (OTH) → Other (do not include skipped)
8. South Asian (SAS) → Asian
Based on the procedure above, the concordance between self-reported ethnicity and the ancestry predictions: 0.877
We released the weights and downstream task datasets of the agroNT, our 1B parameters DNA foundation model for plant genomics 🧬🪴. Check out our github (github.com/instadeepai/nucle…) for Jax lovers⚡❤️and our @huggingface space (huggingface.co/InstaDeepAI) for the pytorch version🤗.
I am looking for PhD research interns at the
@instadeepai Paris office for the coming summer. Join us if you are interested in DNA LLMs and regulatory genomics!
We have different projects available and are flexible to find one that fits your PhD.
Link: instadeep.com/internship-off…
We are very excited to present a major breakthrough achievement – the de novo design of synthetic enhancers for selected tissues in fruit fly embryos in vivo using deep- and transfer learning, @deAlmeida_BPet al published today in @Naturenature.com/articles/s41586-0…. Thread 👇(1/N)
If you are interested in gene-environment interactions, molecular phenotypes and single-cell genomics, join us at @institutpasteur in the sunny Paris!
Post-doctoral position in single-cell genomics at Institut Pasteur in Paris docs.google.com/document/d/e…
We’re excited to announce the first whole genome screening for embryos is available!
Parents can get 100x more data about their embryos’ genomes, empowering them to make an informed decision and give their baby the best chance at a healthy start.
(1) Pleiotropy: Meaning, a variant at a locus can affect multiple traits simultaneously. Also, pleiotropy can be antagonistic, implying that a variant may be simultaneously harmful AND beneficial. What, then, will these types of screenings be selecting for?
(2) GWAS Transferability: PRS show much greater accuracy in individuals of European ancestry than in others. Thus, the PRS they report may be highly inaccurate for many individuals (admixed/from underrepresented populations) and potentially even irrelevant for some.