That thing is fucking impressive. When I was still in academia I kind of intended to build something similar (well, like quarter of that), albeit at somehow different point in the design space (for one: with GC as GC'd runtimes typically end up having more throughput when ran on CRCW APRAM at the cost of losing timing determinism)