⚡️ Icechunk is fast! What does this mean for users? Reduced cost for all data-intensive compute jobs and enhanced productivity for the data scientists who work with data all day long.
Icechunk,
@EarthmoverHQ's new transactional cloud-native storage engine for array / tensor data, works together with
@zarr_dev , augmenting the Zarr core data model with features that enhance performance, collaboration, and safety in a multi-user cloud-computing context.
Reading data through Icechunk is 36x faster than trying to read HDF5 files from cloud object storage, 6x faster than regular Zarr alone, and 2.5x faster than regular Zarr Dask. Most importantly, Icechunk can achieve throughput on par with the compute instance network bandwidth, the "hardware limit" for I/O bound workloads.
Want to learn more about this benchmark? Come to our Icechunk informational webinar tomorrow, Tuesday, October 22nd from 12 - 1 PM EST. Registration link:
share.hsforms.com/1SCOFqe2kT…
ALT Performance analysis of various I/O stacks for reading data from S3. The NetCDF dataset was a 1.8 GB NetCDF file (4GB uncompressed) from the NSF NCAR Curated ECMWF Reanalysis 5 (ERA5) The dataset was transformed to Zarr with zstd compression and written in both V2, V3, and V3 Icechunk format. It was read back using Xarray plus different I/O stacks, with and without Dask.