perf packed release: safetensors 0.8.0 is out ⚡️
Main takeaways:
- direct copy into metal MTLBuffers dlpack for 0-copy hand-off to target framework (only torch for now)
-> 2-3x perf improvement fixes OOMs loading models that are around the limit of unified memory on macOS when loading with transformers
- GIL-free serialization, enabling multi-threaded saves from Python
-> 1.2x to 2x faster for single files, but you can expect more improvement when saving multiple files in parallel!
Check the release notes for the full list of improvements!