8/🧵
✨Key takeaway: By treating LLM features for parallel scaling as a single tensor unit instead of independent slices, each response can give/take info to/from other responses to improve individual response AND response set quality while maintaining total parallelism.