As said before, this is a super useful (and *fast*) tool, that we've already been using quite a bit for troubleshooting spurious primer matching and the like. Warmly recommended!
To all folks dealing with biological data:
do you ever need to check if your reads contain barcodes/adapters/primers/...? Or off-target matches?
Sassy is the tool to use! A super fast implementation of "approximate string matching" with a grep-like CLI.
curiouscoding.nl/posts/sassy…
To all folks dealing with biological data:
do you ever need to check if your reads contain barcodes/adapters/primers/...? Or off-target matches?
Sassy is the tool to use! A super fast implementation of "approximate string matching" with a grep-like CLI.
curiouscoding.nl/posts/sassy…
New preprint!
The SimdQuickHeap is the fastest priority queue, by far.
2x faster than a radix heap, and up to 10x faster than binary heaps.
Coauthored with Johannes Breitling and Marvin Williams.
SimdQuickHeap: The QuickHeap Reconsidered
Johannes Breitling, Ragnar Groot Koerkamp, Marvin Williams
arxiv.org/abs/2604.25681 [𝚌𝚜.𝙳𝚂]
ALT Priority queues are data structures that maintain a dynamic collection of elements and allow inserting new elements and removing the smallest element. The most widely known and used priority queue is likely the implicit binary heap, even though it is has frequent cache misses and is hard to optimize using e.g. SIMD instructions. We introduce the SimdQuickHeap, a variant of the QuickHeap that was introduced by Navarro and Paredes in 2010. As suggested by the name, the data structure bears some similarity to QuickSort. We modify the data layout of the original QuickHeap to have all pivots adjacent in memory, with elements between consecutive pivots stored in dedicated buckets. This allows efficient SIMD implementations for both partitioning of buckets and scanning the list of pivots to find the bucket to append newly inserted elements to. The SimdQuickHeap has amortized expected complexity O(log n) per operation, which improves to O(frac 1Wlog n) in non-degenerate cases, where W is the n
As often: the SimdQuickHeap is a conceptually very simple data structure that thus can be easily optimized using SIMD, leading to pretty big speed gains.
OMG! By popular demand (🤗) I'm going to speak at P99 again!
This time about the SimdQuickHeap preprint that just came out: the fastest priority queue by 2x (or 10x comparing to a binary heap).
We've been working hard behind the scenes to bring you an unforgettable P99CONF 2026. Registration is now open, so take a look at who is coming back and the new speakers we have on board > p99conf.io/2026/04/01/be-par…#ScyllaDB#P99CONF
New preprint!
The SimdQuickHeap is the fastest priority queue, by far.
2x faster than a radix heap, and up to 10x faster than binary heaps.
Coauthored with Johannes Breitling and Marvin Williams.
SimdQuickHeap: The QuickHeap Reconsidered
Johannes Breitling, Ragnar Groot Koerkamp, Marvin Williams
arxiv.org/abs/2604.25681 [𝚌𝚜.𝙳𝚂]
ALT Priority queues are data structures that maintain a dynamic collection of elements and allow inserting new elements and removing the smallest element. The most widely known and used priority queue is likely the implicit binary heap, even though it is has frequent cache misses and is hard to optimize using e.g. SIMD instructions. We introduce the SimdQuickHeap, a variant of the QuickHeap that was introduced by Navarro and Paredes in 2010. As suggested by the name, the data structure bears some similarity to QuickSort. We modify the data layout of the original QuickHeap to have all pivots adjacent in memory, with elements between consecutive pivots stored in dedicated buckets. This allows efficient SIMD implementations for both partitioning of buckets and scanning the list of pivots to find the bucket to append newly inserted elements to. The SimdQuickHeap has amortized expected complexity O(log n) per operation, which improves to O(frac 1Wlog n) in non-degenerate cases, where W is the n
Getting good at GPU programming will make you a better coder on modern CPUs. A throughput-oriented mentality, while critical on GPUs, is also a way better way of coaxing better performance out of the wide, deeply-pipelined, SIMD-enabled monstrosities that are modern CPUs.
Me 15 years ago, as a PhD student: “It seems like all the interesting questions have been answered & there are no useful studies left to do.”
Me now, as a funder focused on evidence-based criminal justice policy: “The biggest constraint we face is finding researchers with time to engage in important new projects.”
Truly, this is the most pressing problem I have. I need an army of scholars who understand causal methods & who would jump at an opportunity to study a program or policy. Would you like to help? Email me. 🙏
Microsoft has set a goal to “eliminate every line of C and C from Microsoft by 2030.”
What are they going to try to replace that C & C code with?
You guessed it. Rust.
And they’re going to use AI to do the “Rust re-write” at an insane speed.
“Our strategy is to combine AI *and* Algorithms to rewrite Microsoft’s largest codebases. Our North Star is “1 engineer, 1 month, 1 million lines of code”.
You read that right.
One million lines of code, per engineer, per month.
Pure insanity. This kind of decision making is common among those with a deeply held, delusional faith in the Cult of Rust.
Take battle tested code, and re-write it (without a clear benefit to the end user) at a recklessly rapid rate. Then force others to adopt that rewritten code before it is ready or properly tested.
All while holding a delusional belief that your new Rust code is superior in all ways, and is inherently bug free thanks to the divine nature of Rust.
We learned this from a post by Galen Hunt, Distinguished Engineer at Microsoft Research.
linkedin.com/posts/galenh_pr…
The experience of having some muddy mess of a hundred line function with goofy "do things one way here, then do almost the same thing somewhere else, then ..." become clear and turn into a 20-line clean "computer science" type function with nice clean data structures is bliss.
One of my favourite tricks of CS algorithms Theory vs Practice is binary search optimisation for cache-friendliness.
The standard algorithm looks at the middle of array and jumps back and forth. This destroys cache.
Instead, store a pre-order traversal of a sorted array represented as a tree.
This way you always hit the first elements of the array, and they can be safely cached.
I think we should have a new runtime option where stack, heap, bss, and tls are all in zones where it is guaranteed that any load of any legally addressable byte is guaranteed not to be <64 bytes from a page boundary or anything that will cause a disaster if loaded. ...
Contemplating a type system where the bit sizes required are 1, 2, 4, 8, 9, 16, 17, 32, 33, 64, 65, 128, 256, 512. Disgusting stuff, I should be ashamed of myself.
Having to pattern match Optional::None and Optional::Some(Optional::None) separately is exactly like that joke about Sartre ordering a coffee with no cream, only to be told “I’m sorry monsieur, we don’t have any cream, would you like it with no milk instead?”