Filter
Exclude
Time range
-
Near
Embed-Search-Align: DNA Sequence Alignment using Transformer models > This paper introduces a novel DNA sequence alignment method called Embed-Search-Align (ESA), which utilizes a Reference-Free DNA Embedding (RDE) Transformer model. ESA achieves genome-wide sequence alignment with remarkable accuracy by transforming DNA subsequences into vector embeddings and performing local searches for alignment. > RDE generates high-quality, reference-free embeddings that map similar DNA sequences close together in an embedding space. This enables alignment by searching for the nearest fragments to a given read, simplifying the alignment process significantly compared to traditional methods. > One of the key innovations of ESA is its use of contrastive loss for self-supervised training, which allows the model to capture the nuances of DNA sequence similarity without relying on a specific reference genome. This makes it versatile for different species and genomic contexts. > The model demonstrates 99% accuracy when aligning 250-length reads onto a human genome, outperforming traditional algorithms like Bowtie and BWA-Mem. Additionally, RDE outperforms other DNA-Transformer models such as Nucleotide Transformer and Hyena-DNA, particularly in genome-wide sequence alignment tasks. > The ESA framework's efficiency in performing large-scale alignment tasks with minimal computational resources is highlighted by its ability to scale effectively across chromosomes and species, showing great promise for genomic research and applications like variant calling and transcriptomics. > By leveraging efficient vector search methods and embedding techniques, ESA reduces the complexity of sequence alignment, making it a powerful tool for next-generation sequencing data analysis. 💻Code: anonymous.4open.science/r/dn… 📜Paper: academic.oup.com/bioinformat… #DNAAlignment #Transformers #Bioinformatics #MachineLearning #Genomics #DeepLearning #AI
5
27
2,284
Embed-Search-Align: DNA Sequence Alignment using Transformer models > This paper introduces a novel DNA sequence alignment method called Embed-Search-Align (ESA), which utilizes a Reference-Free DNA Embedding (RDE) Transformer model. ESA achieves genome-wide sequence alignment with remarkable accuracy by transforming DNA subsequences into vector embeddings and performing local searches for alignment. > RDE generates high-quality, reference-free embeddings that map similar DNA sequences close together in an embedding space. This enables alignment by searching for the nearest fragments to a given read, simplifying the alignment process significantly compared to traditional methods. > One of the key innovations of ESA is its use of contrastive loss for self-supervised training, which allows the model to capture the nuances of DNA sequence similarity without relying on a specific reference genome. This makes it versatile for different species and genomic contexts. > The model demonstrates 99% accuracy when aligning 250-length reads onto a human genome, outperforming traditional algorithms like Bowtie and BWA-Mem. Additionally, RDE outperforms other DNA-Transformer models such as Nucleotide Transformer and Hyena-DNA, particularly in genome-wide sequence alignment tasks. > The ESA framework's efficiency in performing large-scale alignment tasks with minimal computational resources is highlighted by its ability to scale effectively across chromosomes and species, showing great promise for genomic research and applications like variant calling and transcriptomics. > By leveraging efficient vector search methods and embedding techniques, ESA reduces the complexity of sequence alignment, making it a powerful tool for next-generation sequencing data analysis. @LajoyceMboning @KCEnevoldsen 💻Code: anonymous.4open.science/r/dn… 📜Paper: academic.oup.com/bioinformat… #DNAAlignment #Transformers #Bioinformatics #MachineLearning #Genomics #DeepLearning #AI
1
4
894
Embed-Search-Align: DNA Sequence Alignment using Transformer Models 1. Introducing Embed-Search-Align (ESA), a novel framework leveraging Transformer-based Reference-Free DNA Embedding (RDE) to align DNA sequences with unmatched efficiency and accuracy, rivaling traditional methods like Bowtie and BWA-Mem. 2. Key innovation: ESA transforms genome-wide sequence alignment into a vector search task, enabling efficient identification of top-matching fragments through a specialized DNA vector store. 3. RDE achieves 99% accuracy in aligning 250-length reads to a human reference genome, significantly outperforming 6 recent DNA-Transformer baselines like Hyena-DNA and DNABERT-2 in terms of both precision and scalability. 4. Unique features: Self-supervised training with contrastive loss allows RDE to generate rich embeddings, preserving sequence locality in the embedding space and enabling robust cross-species and cross-chromosome alignment. 5. ESA reduces computational complexity, achieving a speed of aligning 10,000 reads per minute while maintaining high accuracy. This represents a step forward in aligning reads for large and complex genomes. 6. Real-world implications: ESA’s superior performance in aligning short reads from simulated and experimental datasets offers transformative potential for genomics, including variant calling, transcriptomics, and epigenomics. 7. Looking ahead: ESA’s framework paves the way for advanced applications like pan-genome alignment and de novo genome assembly, with promising initial results on species like Thermus aquaticus. @LajoyceMboning @KCEnevoldsen 📜Paper: arxiv.org/abs/2309.11087 #Genomics #DNAAlignment #Transformers #MachineLearning #Bioinformatics #SequenceAnalysis #GenomeAssembly
1
13
1,302
🧵 Did you know that protein alignment is often more informative than DNA alignment? ⬇️ Here is why. #bioinformatics #alignment #pairwisealignment #proteinalignment #dnaalignment
1
6
48