pandas core developer

Joined April 2022
22 Photos and videos
Marc Garcia retweeted
The early design decisions for the Categorical type were under strain because of our streaming engine. Every data chunk carried its own mapping between the categories and their underlying physical values, forcing constant re-encoding. The global StringCache we built to solve it caused lock contention and wasn't designed for a distributed architecture. The new Categories object, released in 1.31, solves this, and gives you: • Control over the physical type (UInt8/16/32) • Named categories with namespaces • Parallel updates without locks • Automatic garbage collection When you know the categories up front you can use Enums. They're faster because of their immutability and allow you to define the sorting order of values. The StringCache is now a no-op, but the code will keep working how it used to (with global Categories). You can also migrate by replacing it with explicit Categories where needed. The result is a Categoricals data type that works well on the streaming engine without performance degradation, and is compatible with a distributed architecture. Read the full deep dive: pola.rs/posts/categoricals-r…
1
9
65
5,544
Marc Garcia retweeted
DataFusion 52 is released datafusion.apache.org/blog/2…
11
69
3,927
pandas 3 has been released and marks the most significant evolution of #pandas in over ten years. No more `copy()` everywhere, and no more `lambda` gymnastics. Want examples? Read this hands-on article with the main changes: datapythonista.me/blog/whats…
8
31
215
14,709
We just reorganized the pandas ecosystem page. Is it clearer? Anything missing or not useful? Feedback welcome pandas.pydata.org/community/…

289
Marc Garcia retweeted
8 Jun 2025
We're happy to announce the release of #pandas 2.3.0. You can install it with `pip install pandas` or `conda install -c conda-forge pandas`. Thanks to all contributors and sponsors who made this release possible! The release notes can be found at: pandas.pydata.org/docs/whats…

1
8
72
6,278
pandas 2.3 has been released, the last version before pandas 3.0.
1
17
529
Marc Garcia retweeted
Tech occupations have been among the hardest hit by steep declines in hiring since 2018, per BI;
77
83
409
122,556
Marc Garcia retweeted
Possibly the greatest single male athletic performance of all time

3,203
35,034
425,789
28,526,942
Marc Garcia retweeted
14 Nov 2024
Today we are launching the first open Crash Course training sessions with a limited time discount. These instructor-led sessions are open to everyone looking to get up and running with Polars. Find a date and sign up via our Academy: pola.rs/academy/
1
8
62
4,580
Marc Garcia retweeted
Ella lo único que quiere es cantar:

194
2,563
19,106
580,953
Marc Garcia retweeted
Duck said give me my pookie back 🥹🥰
22
505
6,043
116,413
Marc Garcia retweeted
I hope these two never separate 😭❤️
245
8,004
68,029
2,400,938
Marc Garcia retweeted
This made my day! 😀
345
1,350
8,955
537,788
Marc Garcia retweeted
The reason why i love cats
1,215
7,435
68,337
6,047,723
Marc Garcia retweeted
30 Oct 2024
Perfect training partner
91
852
9,536
252,778
Marc Garcia retweeted
why are cats so silly
456
16,168
163,047
11,827,576
Marc Garcia retweeted
30 Oct 2024
I've written a blog about Tonbo's research on async Rust and io_uring: tonbo.io/blog/async-rust-is-… We need to be careful to avoid the cancellation problem when using async Rust and io_uring together.
4
42
283
30,377