Retargeting and Respecializing GPU Workloads for Performance Portability
In order to come close to peak performance, accelerators like GPUs require significant architecture-specific tuning that understand the availability of shared memory, parallelism, tensor cores, etc…
hgpu.org