The Seq programmig language
A Python dialect for bioinformatics
seq-lang.org/
Seq is a Pythonic language for computational genomics and bioinformatics. With a Python-compatible syntax and a host of domain-specific features and optimizations, Seq makes writing high-performance genomics software as easy as writing Python code, and achieves performance comparable to (and in many cases better than) C/C .
docs.seq-lang.org/
Seq starts with a subset of Python—and is in many cases a drop-in replacement—yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications.
We evaluated Seq on several standard computational genomics tasks like reverse complementation, k-mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160× improvement once domain-specific language features and optimizations are used.
With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.
dl.acm.org/doi/10.1145/33605…
HapTree-X
HapTree-X is a computational tool that phases various kinds of next-generation sequencing data. Currently, it supports whole-genome, whole-exome, 10X Genomics and RNA-seq data.
It is especially powerful on RNA-seq data as it can utilize allelic imbalance to better phase genic regions.
github.com/0xTCG/haptreex
HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms.
nature.com/articles/s41467-0…
HapTree-X's output follows the HapCUT output format convention. The output file will contain the set of phased haplotype blocks in a list format where the beginning of each block starts with BLOCK and the end of each block is indicated by *****.
github.com/vibansal/HapCUT2
HapCUT2
HapCUT2 is an iterative procedure that starts with a candidate haplotype pair. Given the current pair of haplotypes, HapCUT2 searches for a sub-set of variants (using max-cut computations in the read-haplotype graph) such that changing the phase of these variants relative to the remaining set of variants results in a new pair of haplotypes with greater likelihood.
This procedure is repeated iteratively until no further improvements can be made to the likelihood.
genome.cshlp.org/content/27/…