The XUBC7 supercompressed GPU texture codec (lossless or lossy using residual weight grid DCT) is done and shipping soon, after weeks of testing. I've never been able to develop a new codec so fast (days). Fable 5 and Opus 4.8 rock.
It supports RWDCT with dozens of predictors (like WebP), optional Trellis quantization via AC truncation, optional block-level RDO in various ways, and threaded compression and decompression using strips (but not tiles - that's for the CUDA port).
It always uses Zstd as the underlying lossless codec to pack the low-level byte streams, but GPU codecs like GDeflate are possible too.
The R-D curve starts at totally lossless BC7, all modes/features at ~5.6bpp, and it goes all the way down to ~1.5 bpp. The DCT AC frequency quantization tables were heavily tuned to be usable at Q=1, unlike XUASTC 4x4.