Here's a walkthrough to deqaunt FP4 values (uint8 indices x E8M0 scales), without using multiplications.
Step 1): we map the indices into double the FP4 range, so we can work with integers instead of float.
Step 2): E8M0 scales are powers of 2, so we can mul via bitwiseops