Tom Nicholas

Tom Nicholas

25 Photos and videos

Tweets

Pinned Tweet

Tom Nicholas @TEGNicholasCode

19 Dec 2024

At AGU I talked to NASA people about how agencies could better support open-source tools they rely on. I argued that our recent collaboration between Xarray and NASA ESDIS on xarray.DataTree was a good model to copy - read about how it happened here! xarray.dev/blog/datatree

Xarray x NASA: xarray.DataTree for hierarchical data structures

The new xarray.DataTree class allows working with netCDF/Zarr groups, brought to you in collaboration with NASA!

xarray.dev

1,767

Beto

Tom Nicholas retweeted

Beto @betolink

31 Jan 2025

Science needs a social network for sharing big data hackmd.io/@TomNicholas/H1Kzo… by @TEGNicholasCode

Science needs a social network for sharing big data - HackMD

The scientific community now relies heavily on GitHub to share code and scientific results

hackmd.io

245

Pangeo

Tom Nicholas retweeted

Pangeo @pangeo_data

28 Jan 2025

We're moving over to BlueSky and LinkedIn for all our future announcements. Follow us at bsky.app/profile/pangeo.io to find out more about tomorrow's showcase 😉 (p.s., it's on Xpublish at Scale at 4 PM EST 🚀) Connect with us on LinkedIn at linkedin.com/company/pangeo-…

1,043

Xarray

Tom Nicholas retweeted

Xarray @xarray_dev

9 Jan 2025

Our friend's over at @zarr_dev made a big release today! Xarray v2025.01.1 was also released today with full support for Zarr-Python 3 🚀

zarr_dev @zarr_dev

9 Jan 2025

🎉 Zarr-Python 3 is here! 🎉 - Full support for Zarr v3 spec - Chunk-sharding for more efficient data storage - Major performance boosts with async I/O & parallel compression 💻 pip install --upgrade zarr Blog post: zarr.dev/blog/zarr-python-3-…

ALT Dance Dancing GIF by The .GIFYS

4,002

Earthmover

Tom Nicholas retweeted

Earthmover @EarthmoverHQ

20 Dec 2024

🌤️ #AMS2025 is just around the corner! We are taking AMS by storm with an exhibitor booth (booth 353), two talks from @_jhamman and @rabernat , and hosting a @pangeo_data Community Happy Hour (register here: lu.ma/ddtba5f5)!

1,311

Tom Nicholas

Tom Nicholas @TEGNicholasCode

4 Dec 2024

Completely agree - "in theory" we have the simple scalability of the cloud, but in practice it's often a headache, for no good reason, which prevents adoption by most users (including many scientists)

Matthew Rocklin @mrocklin

4 Dec 2024

New Post: Cloud Computing is Broken matthewrocklin.com/cloud-is-… Investor asks: "What's next for Data/Cloud Infrastructure?" My answer: "Boring stuff. People struggle with basics." Cloud feels like MP3 players before iPod. In theory everything is good. In practice adoption is low

304

Ian Schuler

Tom Nicholas retweeted

Ian Schuler @ianschuler

18 Nov 2024

Replying to @mouthofmorrison @rabernat @betolink @EarthmoverHQ

That said, it isn't 100% clear that NASA's best move is to immediately convert 10000 data sets into cutting edge ARCO formats. Kerchunk and Virtual Zarr offer benefits of ARCO while keeping data in the native formats.

2,766

Tom Nicholas

Tom Nicholas @TEGNicholasCode

14 Nov 2024

I'll also be there if you want to join me working on @xarray_dev , DataTree, or VirtualiZarr!

Joe Hamman @_jhamman

14 Nov 2024

Are you heading to #AGU24 next month? Consider joining us for a bonus day of hacking on @pangeo_data. I'll be there representing @EarthmoverHQ and helping folks work with #icechunk and @zarr_dev. Details and signup here: discourse.pangeo.io/t/post-a…

335

Deepak Cherian

Tom Nicholas retweeted

Deepak Cherian @cherian_deepak

12 Nov 2024

Come learn about recent @xarray_dev GroupBy improvements at tomorrow's (Wed, Nov 13) Pangeo Showcase! discourse.pangeo.io/t/pangeo…

1,774

Joe Hamman

Tom Nicholas retweeted

Joe Hamman @_jhamman

24 Oct 2024

We've talked a lot about #Icechunk's performance this week 🚀. But the Zarr-Python 3 results are also very encouraging! We're a few weeks away from the 3.0 launch but what this chart shows is that the new AsyncIO multi-threading functionality in Zarr is going to be really good.

Tom Nicholas @TEGNicholasCode

24 Oct 2024

Replying to @TEGNicholasCode

ALSO this release is the first to be compatible with the much anticipated v3 implementation of zarr-python! (still on its beta branch right now) This brings big performance benefits when reading @zarr_dev on S3 via async and (b) compatibility with @EarthmoverHQ 's Icechunk.

626

Tom Nicholas

Tom Nicholas @TEGNicholasCode

24 Oct 2024

Xarray v2024.10.0 has just been released, including support for xarray.DataTree and zarr-python v3 !!! github.com/pydata/xarray/rel… @xarray_dev @zarr_dev

Release v2024.10.0 · pydata/xarray

This release brings official support for xarray.DataTree, and compatibility with zarr-python v3! Aside from these two huge features, it also improves support for vectorised interpolation and fixes ...

github.com

13,451

more replies

Tom Nicholas

Tom Nicholas @TEGNicholasCode

24 Oct 2024

948

Tom Nicholas

Tom Nicholas @TEGNicholasCode

24 Oct 2024

All these integrations represent literally years-worth of effort, all coming out at once 🤯 And that's not even mentioning all the other changes you see in a typical xarray release!

288

Ryan Abernathey

Tom Nicholas retweeted

Ryan Abernathey

@rabernat

21 Oct 2024

⚡️ Icechunk is fast! What does this mean for users? Reduced cost for all data-intensive compute jobs and enhanced productivity for the data scientists who work with data all day long. Icechunk, @EarthmoverHQ's new transactional cloud-native storage engine for array / tensor data, works together with @zarr_dev , augmenting the Zarr core data model with features that enhance performance, collaboration, and safety in a multi-user cloud-computing context. Reading data through Icechunk is 36x faster than trying to read HDF5 files from cloud object storage, 6x faster than regular Zarr alone, and 2.5x faster than regular Zarr Dask. Most importantly, Icechunk can achieve throughput on par with the compute instance network bandwidth, the "hardware limit" for I/O bound workloads. Want to learn more about this benchmark? Come to our Icechunk informational webinar tomorrow, Tuesday, October 22nd from 12 - 1 PM EST. Registration link: share.hsforms.com/1SCOFqe2kT…

Performance analysis of various I/O stacks for reading data from S3. The NetCDF dataset was a 1.8 GB NetCDF file (4GB uncompressed) from the NSF NCAR Curated ECMWF Reanalysis 5 (ERA5) The dataset was transformed to Zarr with zstd compression and written in both V2, V3, and V3 Icechunk format. It was read back using Xarray plus different I/O stacks, with and without Dask.

ALT Performance analysis of various I/O stacks for reading data from S3. The NetCDF dataset was a 1.8 GB NetCDF file (4GB uncompressed) from the NSF NCAR Curated ECMWF Reanalysis 5 (ERA5) The dataset was transformed to Zarr with zstd compression and written in both V2, V3, and V3 Icechunk format. It was read back using Xarray plus different I/O stacks, with and without Dask.

4,699

Source Cooperative

Tom Nicholas retweeted

Source Cooperative @source_coop

18 Oct 2024

🎉 @source_coop is now open source! The web application - github.com/source-cooperativ… - and the data proxy - github.com/source-cooperativ… - have been opened up & updated with documentation on how to get it running locally. More documentation coming soon tasks for new developers!

GitHub - source-cooperative/source.coop: Source Cooperative Web Interface & API

Source Cooperative Web Interface & API. Contribute to source-cooperative/source.coop development by creating an account on GitHub.

github.com

928

Earthmover

Tom Nicholas retweeted

Earthmover @EarthmoverHQ

17 Oct 2024

We’re hosting a webinar on Tuesday, October 22 from 12- 1 PM EST to discuss what Icechunk means for the scientific data community and answer questions from attendees. Register here: share.hsforms.com/1SCOFqe2kT…

Earthmover @EarthmoverHQ

15 Oct 2024

🚀 We are thrilled to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Read our blog post about Icechunk here: earthmover.io/blog/icechunk

1,508

Ryan Abernathey

Tom Nicholas retweeted

Ryan Abernathey

@rabernat

17 Oct 2024

Great opportunity to work with @BalwadaDhruv, one of most innovative physical oceanographers in the world, at @LamontEarth in NYC!

Dhruv Balwada @BalwadaDhruv

16 Oct 2024

We are looking to hire a postdoctoral scholar at Lamont Doherty Earth Observatory to work on submesoscale and mesoscale ocean turbulence using observations and machine learning: academic.careers.columbia.ed…

1,205