Filter
Exclude
Time range
-
Near
OpenXData by @Onehousehq is live tomorrow! Join @changhiskhan's session to learn how LanceDB's unified multimodal lakehouse streamlines the entire model training pipeline โ€” from exploration to GPU-loading โ€” eliminating the fragmented tooling that causes so many enterprise AI training efforts to stall or fail. Don't miss this virtual conference featuring speakers from Anthropic, Anyscale, Uber, and more!
2
2
3
323
I am thrilled to be taking the stage with Yufei Gu from @Snowflake at #OpenXData 2026! ๐Ÿš€ Weโ€™re diving into "Polaris Meets Apache Hudi": discussing how weโ€™re unifying lakehouse metadata and governance across table formats without compromise.
1
1
3
93
What does compaction, cleaning, and clustering look like when you operate at Uber scale? At OpenXData, Uber engineers Vamshi Pasunuru and Xinli Shang will share how their team built scalable table services to balance ingestion latency with query performance, and how they decouple background maintenance to keep data fresh and analytics fast. In a recent blog post, Uber noted that its Apache Hudi deployment supports 19,500 datasets, 10 PB of daily ingestion, and 70,000 table service operations per day. This talk should be especially relevant for teams running large lakehouse deployments where table maintenance directly impacts reliability and performance. Catch it at OpenXData on April 29 ๐Ÿ‘‰ openxdata.ai/ #OpenXData #ApacheHudi #DataEngineering #Lakehouse #DataPlatform #OpenSource
2
76
@kevinjqliu is on the Apache Icebergโ„ข PMC and leads Iceberg work for Microsoft OneLake. At #OpenXData, heโ€™ll lay out a practical model for making Iceberg easier to get started with. His session centers on a public Iceberg REST Catalog paired with open datasets, which gives teams a simpler way to try Iceberg end to end. The value is not just easier onboarding. It is also a cleaner setup for fair benchmarking, cloud-agnostic access, and sharing data across engines and vendors. Worth catching live if your team is thinking through how to adopt Iceberg: openxdata.ai/ #OpenXData #ApacheIceberg #DataEngineering #Lakehouse #OpenData #Interoperability #OpenSource
1
2
101
@J_ co-created Apache Parquet, Apache Arrow, and OpenLineage. Three projects. Three industry standards. Parquet at Twitter in 2013. Arrow at Dremio. OpenLineage at Datakin, acquired as part of Astronomer's $213M Series C. He is now Principal Engineer at Datadog and an officer of the Apache Software Foundation. That is an unusual track record of picking the right abstraction at the right time. His OpenXData talk argues that the current wave of challengers -- Lance, Vortex, Nimble, FastLanes, BtrBlocks, F3 -- are solving real problems but misreading what made Parquet succeed in the first place. The core contribution was not the encoding choices. It was the community consensus mechanism those choices were built inside. His case: use established open source communities to absorb these innovations rather than fragment the ecosystem across six competing formats. He published the written version of this argument at sympathetic.ink in December 2025. OpenXData is where you can push back live. ๐Ÿ‘‰ Register here: openxdata.ai
2
6
359
21 May 2025
That's a wrap! What an excellent time at #OpenXData today. ๐Ÿ‘ Thanks to @confluentinc, @databricks, and @dbt_labs for co-sponsoring. And thank you to all the presenters - literally too many to list - for the thoughtful and enlightening topics. @mlopscommunity knows how to put on a show! ๐ŸŽ‰ You can catch the replay here for now. event.openxdata.ai/e/78b967cโ€ฆ
4
115
At #OpenXData virtual conference: Google BigQuery seems from implement the same approach of having an "internal" metadata format for managed #ApacheIceberg tables, the iceberg manifests etc are exported as read-only snapshots for access from other engines. This approach is same as : Snowflake managed Iceberg tables and @apachehudi @apachextable . Curious how OneLake implements it. Is the source of truth for query planning Iceberg metadata or an internal catalog/metadata. #OpenXData
1
2
364
21 May 2025
๐Ÿšจ Itโ€™s almost time โ€” #OpenXData kicks off in just a few minutes! Doors open at 9:00 AM PT, and the first keynote starts at 9:30 AM. Join the biggest names in data: co-hosts Onehouse, @confluentinc, @databricks, and @dbt_labs, plus speakers from @Google, @netflix, @Meta, @salesforce, @Zoom, @onepeloton, and more. If you're working with open table formats or building modern data platforms, this is the place to be. ๐ŸŽŸ Weโ€™ve still got a few free tickets left โ€” grab yours now and tune in live: openxdata.ai/?utm_source=twiโ€ฆ
2
186
๐Ÿšจ Countdownโ€™s on! #OpenXData kicks off tomorrowโ€”donโ€™t miss the education event of the year for open data architecture builders. ๐Ÿ“Œ RSVP here: openxdata.ai/ ๐ŸŽ™๏ธ Donโ€™t miss Sida Shenโ€™s session at 2:15 PM PT: "Scale Without Silos: Customer-Facing Analytics on Open Data."
1
2
137
19 May 2025
#OpenXData is May 21. Join @Onehouse, @Confluent, @Databricks & dbt Labs for a free virtual event on open data architectures. ๐ŸŽค Donโ€™t miss Amy Chenโ€™s keynote: "Not Just Lettuce: How Iceberg dbt bring order to open formats." Register: bit.ly/3H0Dh5c
1
2
544
19 May 2025
๐Ÿ“ข Just two days to go! #OpenXData is the premier event on open data architectures for data practitioners this year. As if the speaker lineup wasnโ€™t enough to get you in the (virtual) door, weโ€™re giving away Apple AirPods Max ๐ŸŽง to three lucky attendees. Save your spot now! openxdata.ai/?utm_source=twiโ€ฆ #dataEngineering #OpenData #dataarchitecture
2
1
125
Real-time meets open table formats. Join us at #OpenXData to see how Tableflow simplifies streaming data into Apache Icebergโ„ข, Delta Lake, and Apache Hudiโ„ข This means no custom brittle integrations and batch jobs. Just faster data for your lakehouse. Join us: ๐Ÿ—“๏ธ May 21 | 1:35PM PT ๐Ÿ“ Live Virtual Conference ๐Ÿ‘‰ Register now: cnfl.io/4jaG6Op
1
4
712
๐——๐—ฎ๐˜๐—ฎ ๐—ฃ๐—ถ๐—ฝ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ๐˜€ ๐—ถ๐—ป ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ๐˜€ can become complex and for a good reason ๐Ÿ‘‡ It is critical to ensure Data Quality and Integrity upstream of ML Training and Inference Pipelines, trying to do that in the downstream systems will cause unavoidable failure when working at scale. There is a ton of work to be done on the Data Lake or LakeHouse layer. ๐—ฆ๐—ฒ๐—ฒ ๐˜๐—ต๐—ฒ ๐—ฒ๐˜…๐—ฎ๐—บ๐—ฝ๐—น๐—ฒ ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฏ๐—ฒ๐—น๐—ผ๐˜„. Join a free OpenXData virtual conference on May 21st to learn about open data architectures - Iceberg, Hudi, LakeHouses, query engines and more. Talks from Netflix, dbt Labs, Databricks, Microsoft, Google, Meta, Peloton and other open data geeks. Register for free here: openxdata.ai/?utm_source=linโ€ฆ ๐˜Œ๐˜น๐˜ข๐˜ฎ๐˜ฑ๐˜ญ๐˜ฆ ๐˜ข๐˜ณ๐˜ค๐˜ฉ๐˜ช๐˜ต๐˜ฆ๐˜ค๐˜ต๐˜ถ๐˜ณ๐˜ฆ ๐˜ง๐˜ฐ๐˜ณ ๐˜ข ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฅ๐˜ถ๐˜ค๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜จ๐˜ณ๐˜ข๐˜ฅ๐˜ฆ ๐˜ฆ๐˜ฏ๐˜ฅ-๐˜ต๐˜ฐ-๐˜ฆ๐˜ฏ๐˜ฅ ๐˜ฅ๐˜ข๐˜ต๐˜ข ๐˜ง๐˜ญ๐˜ฐ๐˜ธ: ๐Ÿญ: Schema changes are implemented in version control, once approved - they are pushed to the Applications generating the Data, Databases holding the Data and a central Data Contract Registry. Applications push generated Data to Kafka Topics: ๐Ÿฎ: Events emitted directly by the Application Services. ๐Ÿ‘‰ This also includes IoT Fleets and Website Activity Tracking. ๐Ÿฎ.๐Ÿญ: Raw Data Topics for CDC streams. ๐Ÿฏ: A Flink Application(s) consumes Data from Raw Data streams and validates it against schemas in the Contract Registry. ๐Ÿฐ: Data that does not meet the contract is pushed to Dead Letter Topic. ๐Ÿฑ: Data that meets the contract is pushed to Validated Data Topic. ๐Ÿฒ: Data from the Validated Data Topic is pushed to object storage for additional Validation. ๐Ÿณ: On a schedule Data in the Object Storage is validated against additional SLAs in Data Contracts and is pushed to the Data Warehouse to be Transformed and Modeled for Analytical purposes. ๐Ÿด: Modeled and Curated data is pushed to the Feature Store System for further Feature Engineering. ๐Ÿด.๐Ÿญ: Real Time Features are ingested into the Feature Store directly from Validated Data Topic (5). ๐Ÿ‘‰ Ensuring Data Quality here is complicated since checks against SLAs is hard to perform. ๐Ÿต: High Quality Data is used in Machine Learning Training Pipelines. ๐Ÿญ๐Ÿฌ: The same Data is used for Feature Serving in Inference. Note: ML Systems are plagued by other Data related issues like Data and Concept Drifts. These are silent failures and while they can be monitored, we canโ€™t include it in the Data Contract. Let me know your thoughts! ๐Ÿ‘‡ #AI #MachineLearning #DataEngineering
4
70
260
10,462
12 May 2025
๐Ÿ•’ Ever wanted to spin up a #datalakehouse but couldn't find the time? โšก Let Chandra Krishnan, Solutions Engineer at Onehouse,ย show you how quickly it can be doneโ€”from spinning up a fresh data source, building pipelines, adding transformations, integrating catalogs, all the way to generating insights. ๐ŸŽ“ Stick around after the last keynote at #OpenXData next week for a free workshop. ๐Ÿ”— openxdata.ai/?utm_source=twiโ€ฆ #openData #dataengineering #dataarchitecture
1
2
132
๐Ÿ”ฅ Announcing OpenXData - the free virtual conference on open data ๐Ÿ”ฅ OpenXData brings together 25 sessions by data innovators and thought leaders from companies like @Meta, @netflix, @salesforce, @onepeloton, and more, to share best practices and the latest trends in the world of open data. Co-hosted by @confluentinc, @getdbt, @databricks, and Onehouse, the event is virtual and entirely free to attend. ๐Ÿ‘‰ Reserve your spot and see the full agenda โ†’ openxdata.ai/?utm_source=twiโ€ฆ #OpenXData
1
3
120
โ€œOpenXData involves the use of a mobile phone, a tool that is within the reach of most Ugandansโ€ Ms. Lighton... http://fb.me/wfh3Cw4T