LanceDB

LanceDB

Users
Tweets

LanceDB

@lancedb

Apr 28

OpenXData by @Onehousehq is live tomorrow! Join @changhiskhan's session to learn how LanceDB's unified multimodal lakehouse streamlines the entire model training pipeline — from exploration to GPU-loading — eliminating the fragmented tooling that causes so many enterprise AI training efforts to stall or fail. Don't miss this virtual conference featuring speakers from Anthropic, Anyscale, Uber, and more!

323

Y Ethan Guo

Y Ethan Guo @ethanyguo

Apr 23

I am thrilled to be taking the stage with Yufei Gu from @Snowflake at #OpenXData 2026! 🚀 We’re diving into "Polaris Meets Apache Hudi": discussing how we’re unifying lakehouse metadata and governance across table formats without compromise.

Onehouse

Onehouse

@Onehousehq

Apr 21

What does compaction, cleaning, and clustering look like when you operate at Uber scale? At OpenXData, Uber engineers Vamshi Pasunuru and Xinli Shang will share how their team built scalable table services to balance ingestion latency with query performance, and how they decouple background maintenance to keep data fresh and analytics fast. In a recent blog post, Uber noted that its Apache Hudi deployment supports 19,500 datasets, 10 PB of daily ingestion, and 70,000 table service operations per day. This talk should be especially relevant for teams running large lakehouse deployments where table maintenance directly impacts reliability and performance. Catch it at OpenXData on April 29 👉 openxdata.ai/ #OpenXData #ApacheHudi #DataEngineering #Lakehouse #DataPlatform #OpenSource

Onehouse

Onehouse

@Onehousehq

Apr 13

@kevinjqliu is on the Apache Iceberg™ PMC and leads Iceberg work for Microsoft OneLake. At #OpenXData, he’ll lay out a practical model for making Iceberg easier to get started with. His session centers on a public Iceberg REST Catalog paired with open datasets, which gives teams a simpler way to try Iceberg end to end. The value is not just easier onboarding. It is also a cleaner setup for fair benchmarking, cloud-agnostic access, and sharing data across engines and vendors. Worth catching live if your team is thinking through how to adopt Iceberg: openxdata.ai/ #OpenXData #ApacheIceberg #DataEngineering #Lakehouse #OpenData #Interoperability #OpenSource

101

Onehouse

Onehouse

@Onehousehq

Apr 3

@J_ co-created Apache Parquet, Apache Arrow, and OpenLineage. Three projects. Three industry standards. Parquet at Twitter in 2013. Arrow at Dremio. OpenLineage at Datakin, acquired as part of Astronomer's $213M Series C. He is now Principal Engineer at Datadog and an officer of the Apache Software Foundation. That is an unusual track record of picking the right abstraction at the right time. His OpenXData talk argues that the current wave of challengers -- Lance, Vortex, Nimble, FastLanes, BtrBlocks, F3 -- are solving real problems but misreading what made Parquet succeed in the first place. The core contribution was not the encoding choices. It was the community consensus mechanism those choices were built inside. His case: use established open source communities to absorb these innovations rather than fragment the ecosystem across six competing formats. He published the written version of this argument at sympathetic.ink in December 2025. OpenXData is where you can push back live. 👉 Register here: openxdata.ai

359

Onehouse

Onehouse

@Onehousehq

21 May 2025

That's a wrap! What an excellent time at #OpenXData today. 👏 Thanks to @confluentinc, @databricks, and @dbt_labs for co-sponsoring. And thank you to all the presenters - literally too many to list - for the thoughtful and enlightening topics. @mlopscommunity knows how to put on a show! 🎉 You can catch the replay here for now. event.openxdata.ai/e/78b967c…

OpenXData Conference 2025

Checkout this event on Airmeet

event.openxdata.ai

115

Vinoth Chandar

Vinoth Chandar

@byte_array

21 May 2025

At #OpenXData virtual conference: Google BigQuery seems from implement the same approach of having an "internal" metadata format for managed #ApacheIceberg tables, the iceberg manifests etc are exported as read-only snapshots for access from other engines. This approach is same as : Snowflake managed Iceberg tables and @apachehudi @apachextable . Curious how OneLake implements it. Is the source of truth for query planning Iceberg metadata or an internal catalog/metadata. #OpenXData

364

Onehouse

Onehouse

@Onehousehq

21 May 2025

🚨 It’s almost time — #OpenXData kicks off in just a few minutes! Doors open at 9:00 AM PT, and the first keynote starts at 9:30 AM. Join the biggest names in data: co-hosts Onehouse, @confluentinc, @databricks, and @dbt_labs, plus speakers from @Google, @netflix, @Meta, @salesforce, @Zoom, @onepeloton, and more. If you're working with open table formats or building modern data platforms, this is the place to be. 🎟 We’ve still got a few free tickets left — grab yours now and tune in live: openxdata.ai/?utm_source=twi…

186

PhoenixAI (formerly CelerData)

PhoenixAI (formerly CelerData) @phoenixdataai

20 May 2025

🚨 Countdown’s on! #OpenXData kicks off tomorrow—don’t miss the education event of the year for open data architecture builders. 📌 RSVP here: openxdata.ai/ 🎙️ Don’t miss Sida Shen’s session at 2:15 PM PT: "Scale Without Silos: Customer-Facing Analytics on Open Data."

137

dbt Labs

dbt Labs @dbt_labs

19 May 2025

#OpenXData is May 21. Join @Onehouse, @Confluent, @Databricks & dbt Labs for a free virtual event on open data architectures. 🎤 Don’t miss Amy Chen’s keynote: "Not Just Lettuce: How Iceberg dbt bring order to open formats." Register: bit.ly/3H0Dh5c

544

Onehouse

Onehouse

@Onehousehq

19 May 2025

📢 Just two days to go! #OpenXData is the premier event on open data architectures for data practitioners this year. As if the speaker lineup wasn’t enough to get you in the (virtual) door, we’re giving away Apple AirPods Max 🎧 to three lucky attendees. Save your spot now! openxdata.ai/?utm_source=twi… #dataEngineering #OpenData #dataarchitecture

125

Confluent

Confluent

@confluentinc

16 May 2025

Real-time meets open table formats. Join us at #OpenXData to see how Tableflow simplifies streaming data into Apache Iceberg™, Delta Lake, and Apache Hudi™ This means no custom brittle integrations and batch jobs. Just faster data for your lakehouse. Join us: 🗓️ May 21 | 1:35PM PT 📍 Live Virtual Conference 👉 Register now: cnfl.io/4jaG6Op

712

Aurimas Griciūnas

Aurimas Griciūnas

@Aurimas_Gr

14 May 2025

𝗗𝗮𝘁𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 𝗶𝗻 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 can become complex and for a good reason 👇 It is critical to ensure Data Quality and Integrity upstream of ML Training and Inference Pipelines, trying to do that in the downstream systems will cause unavoidable failure when working at scale. There is a ton of work to be done on the Data Lake or LakeHouse layer. 𝗦𝗲𝗲 𝘁𝗵𝗲 𝗲𝘅𝗮𝗺𝗽𝗹𝗲 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗯𝗲𝗹𝗼𝘄. Join a free OpenXData virtual conference on May 21st to learn about open data architectures - Iceberg, Hudi, LakeHouses, query engines and more. Talks from Netflix, dbt Labs, Databricks, Microsoft, Google, Meta, Peloton and other open data geeks. Register for free here: openxdata.ai/?utm_source=lin… 𝘌𝘹𝘢𝘮𝘱𝘭𝘦 𝘢𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦 𝘧𝘰𝘳 𝘢 𝘱𝘳𝘰𝘥𝘶𝘤𝘵𝘪𝘰𝘯 𝘨𝘳𝘢𝘥𝘦 𝘦𝘯𝘥-𝘵𝘰-𝘦𝘯𝘥 𝘥𝘢𝘵𝘢 𝘧𝘭𝘰𝘸: 𝟭: Schema changes are implemented in version control, once approved - they are pushed to the Applications generating the Data, Databases holding the Data and a central Data Contract Registry. Applications push generated Data to Kafka Topics: 𝟮: Events emitted directly by the Application Services. 👉 This also includes IoT Fleets and Website Activity Tracking. 𝟮.𝟭: Raw Data Topics for CDC streams. 𝟯: A Flink Application(s) consumes Data from Raw Data streams and validates it against schemas in the Contract Registry. 𝟰: Data that does not meet the contract is pushed to Dead Letter Topic. 𝟱: Data that meets the contract is pushed to Validated Data Topic. 𝟲: Data from the Validated Data Topic is pushed to object storage for additional Validation. 𝟳: On a schedule Data in the Object Storage is validated against additional SLAs in Data Contracts and is pushed to the Data Warehouse to be Transformed and Modeled for Analytical purposes. 𝟴: Modeled and Curated data is pushed to the Feature Store System for further Feature Engineering. 𝟴.𝟭: Real Time Features are ingested into the Feature Store directly from Validated Data Topic (5). 👉 Ensuring Data Quality here is complicated since checks against SLAs is hard to perform. 𝟵: High Quality Data is used in Machine Learning Training Pipelines. 𝟭𝟬: The same Data is used for Feature Serving in Inference. Note: ML Systems are plagued by other Data related issues like Data and Concept Drifts. These are silent failures and while they can be monitored, we can’t include it in the Data Contract. Let me know your thoughts! 👇 #AI #MachineLearning #DataEngineering

260

10,462

Onehouse

Onehouse

@Onehousehq

12 May 2025

🕒 Ever wanted to spin up a #datalakehouse but couldn't find the time? ⚡ Let Chandra Krishnan, Solutions Engineer at Onehouse, show you how quickly it can be done—from spinning up a fresh data source, building pipelines, adding transformations, integrating catalogs, all the way to generating insights. 🎓 Stick around after the last keynote at #OpenXData next week for a free workshop. 🔗 openxdata.ai/?utm_source=twi… #openData #dataengineering #dataarchitecture

132

Onehouse

Onehouse

@Onehousehq

1 May 2025

🔥 Announcing OpenXData - the free virtual conference on open data 🔥 OpenXData brings together 25 sessions by data innovators and thought leaders from companies like @Meta, @netflix, @salesforce, @onepeloton, and more, to share best practices and the latest trends in the world of open data. Co-hosted by @confluentinc, @getdbt, @databricks, and Onehouse, the event is virtual and entirely free to attend. 👉 Reserve your spot and see the full agenda → openxdata.ai/?utm_source=twi… #OpenXData

120

残心 - Shahab Sabahi

残心 - Shahab Sabahi @shahabks

7 May 2021

#AI #python #MachineLearning #NeuralNetworks #DataAnalytics #NLP #DataScience #speech #behavioralscience #DataAnalytics #SoftwareEngineer openXDATA an open-source tool for missing datasets labels completion github.com/fweninger/openXDA…

Makerere University

Makerere University

@Makerere

27 Aug 2010

“OpenXData involves the use of a mobile phone, a tool that is within the reach of most Ugandans” Ms. Lighton... http://fb.me/wfh3Cw4T