前田健太郎@AIのためのデータ基盤

前田健太郎@AIのためのデータ基盤

Users
Tweets

前田健太郎@AIのためのデータ基盤

@rakudeji

Jun 2

結局OpenTableFormatなんだよなー誰もその流れには逆らえない時代来てるね #SnowflakeSummit

361

孫鋮(Marko)

孫鋮(Marko) @sd2438892

May 20

Lakehouse時代は、“フォーマット選定” より “カタログ戦略” が重要。データ基盤の未来は「オープン性相互運用性ガバナンス」の設計勝負になってきたと感じる。非常に共感。 #ApacheIceberg #Databricks #DataLakehouse #DataEngineering #OpenTableFormat #DataPlatform #Snowflake #StarRocks

285

ENGORGIO, INC.

ENGORGIO, INC.

@EngorgioAI

26 Nov 2025

オープンデータフォーマットの適材適所 Apache Iceberg vs Delta Lake note.com/engorgio/n/n2913d74… #データレイクハイス #apacheiceberg #deltalake #opentableformat

オープンデータフォーマットの適材適所｜Steven Valentain

2025年12月12日更新（Hudiを追加）本日は、Open Data Formats/オープンデータフォーマット（Delta Lake、Apache Iceberg、Apache Hudi）で、普及し始めた３つフォーマット（DeltaとIceberg、Hudi）について、調査してみました。 Open Data Formatは、その文脈から、Open Table Formats, Open...

note.com

29,284

Mim

Mim @mim_djo

13 Aug 2025

Writing #ApacheIceberg in Azure is not particularly hard, but you do need a catalog (essentially a database). For simple tests, you can use an in-memory DB #ADLS #opentableformat #PyIceberg.

1,243

Kai Wähner

Kai Wähner @KaiWaehner

25 Jun 2025

I’m working on a new blog post on how #OpenTableFormat with #ApacheIceberg & #DeltaLake, plus #ShiftLeft architecture, accelerates the move from #Lambda to unified #Kappa. One #RealTime pipeline. Simpler, scalable, powering analytics transactions. 🔗 kai-waehner.de/blog/2021/09/…

228

jun/Junki Ishigaki

jun/Junki Ishigaki @tokyo_jjjx

2 Jun 2025

OTF->OpenTableFormat #storagejaws #bdjaws #jawsug

theCUBE

theCUBE

@theCUBE

25 Mar 2025

In this #AWSPiDay exclusive, join @theCUBEresearch’s @RealStrech with @andywarfield to explore the evolution of #Iceberg table format and its impact on @AWScloud’s S3. 💡 Learn more at #theCUBE! thecube.net/events/aws/diggi… #OpenTableFormat

2:25

The evolution of Iceberg tables & the transformation of S3

#theCUBEresearch's Rob Strechay speaks to AWS' Andy Warfield during #AWSPiDay.

1,830

syakesaba

syakesaba @_syakesaba_

16 Mar 2025

OpenTableFormatをDuckDBから使ってみた。データを任意のオブジェクトストレージやらファイルシステムにテーブルとして設置できる。MoRとかちゃんと追従できるか確かめていきたい。メモリ効率はDuckDBの方が良い？（スピードは遅い） syakesaba.com/tech/iceberg-d… #ApacheIceberg #DuckDB #ApacheSpark

OpenTableFormatを使ってデータレイクハウスを使ってみる

マネージドなOTFサービスが出てきたので、Apache Icebergを触り理解を深めていきます。

syakesaba.com

theCUBE

theCUBE

@theCUBE

14 Mar 2025

Airing NOW! 🚨 Tune into #AWSPiDay, where @theCUBEresearch’s @RealStrech is speaking with @andywarfield about the transformation of #Iceberg table format and its benefits for @awscloud’s S3. 📺 Tune in NOW! thecube.net/events/aws/diggi… #OpenTableFormat

2:25

8,974

Vinoth Chandar

Vinoth Chandar

@byte_array

6 Mar 2025

Recently, on a podcast, I was asked, “Why Hudi?”. Not a history lesson, but “Why Hudi today?” Most of what I do is telling companies to collect, store, and process more data and make everything better. So, it's only fair that I write down 21 reasons, not just one. 🔗 Read the full blog post here: hudi.apache.org/blog/2025/03… Here’s the rundown. Here’s why Hudi should be at the core of your data platform 1️⃣ Well-Balanced Storage Format 2️⃣ Database-like Secondary Indexes 3️⃣ Efficient Merge-on-Read (MoR) Design 4️⃣ Scalable Metadata for Large-Scale Datasets 5️⃣ Built-In Table Services 6️⃣ Data Management Smarts 7️⃣ Concurrency Control Purpose-built For the Lake 8️⃣ Performance at Scale 9️⃣ Out-of-box CDC/Streaming Ingestion 🔟 First-Class Support for Keys 1️⃣1️⃣ Streaming-First Design 1️⃣2️⃣ Efficient Incremental Processing 1️⃣3️⃣ Powerful Apache Spark Implementation 1️⃣4️⃣ Next-Gen Flink Writer for Streaming Pipelines 1️⃣5️⃣ Avoid Compute Lockins 1️⃣6️⃣ Seamless Interop Iceberg/Delta Lake and Catalog Syncs 1️⃣7️⃣ Truly Open and Community-Driven 1️⃣8️⃣ Massive Adoption Across Industries 1️⃣9️⃣ Proven Reliability in High-Pressure Workloads 2️⃣0️⃣ Cloud-Native Lakehouse-Ready 2️⃣1️⃣ Future-Proof and Actively Evolving Come join our community as we work towards adding 21 more this year. #ApacheHudi #DataLakehouse #BigData #DataEngineering #StreamingData #CDC #ApacheFlink #ApacheSpark #OpenTableFormat #DataLakes #DataManagement #OpenSource #ApacheXTable #DataInfrastructure #CloudData #RealTimeAnalytics #MachineLearning #DataPlatform 🚀

1,863

Vinoth Chandar

Vinoth Chandar

@byte_array

9 Jan 2025

ICYMI, it was a blast answering all of @ananthdurai 's direct questions yesterday about #apachehudi, the data lakehouse/open table format ecosystem, and the surrounding drama. Catch the recording: linkedin.com/events/bridging… Key discussion points: 🏎 Performance is often one of many considerations. We discussed how, high performance is a necessity on data lakehouse since all we spend money on outside query engines is to run jobs that either ETL/Ingest/Optimize data. Hudi makes them all incremental and efficient while supporting standard batch workloads. ❤️ I brought receipts to showcase how the community thrives as a mainstream OSS project across the industry. We talked about how important and challenging it is to preserve this vibrant community. 🎇 We summarized the Hudi 1.0 features, which push the data lakehouse closer to database functionality across storage format, concurrency control, streaming data support and indexing. These changes bring several “never before” capabilities around the key cornerstone lakehouse feature set. We remain focused on solving complex computer science problems using open-source software. 🤼 I was thrilled to be asked tough questions on table format wars winners/losers. It gave me a rare opportunity to put events in perspective and explain the vendor chess moves that are unrelated to Hudi, its community or even Onehouse. My favorite part was encouraging Delta Lake users to carefully consider what they lose/gain from “standardization” before wasting time on a migration project as a third party (I guess many wouldn’t have hoped to see this day) ❓I also raised some questions. Why does every data warehouse default to a closed table format as the default? Are users going to stop using them now? Why the obsession with converging the three OSS data lakehouse projects alone? Overall, it's up to the market/users to decide slow standardization vs fast innovation. My view: it's healthy to have both and multiple choices in any ecosystem. 📈 Loved the discussion on the pains around easily be up & running with Data Lakehouse. Some low-hanging fruits around software packaging could help in the near term. We discussed the hierarchy of needs here: table format -> data lakehouse frameworks -> DBMS server/cluster software, which is all closed software ATM. There is a missing open-source software stack, and we are slowly crawling toward a database specialized in data lakehouse architecture/workloads. #apachehudi #data #database #datalakehouse #datawarehouse #opentableformat #dataengineering #apacheiceberg #deltalake

LinkedIn Login, Sign in | LinkedIn

linkedin.com

799

Vinoth Chandar

Vinoth Chandar

@byte_array

17 Dec 2024

🎉 We’re proud to announce the @apachehudi 1.0 release! This release has been the result of a massive community effort, with tons of new code (re)written. I want to thank all 60 contributors who worked on ~180K lines of change. 🗒️ Release blog: hudi.apache.org/blog/2024/12… Hudi is still the OG of the data lakehouse when it comes to real technical innovation, as will become apparent below. 👇 🔥 Secondary Indexing - yes! you read it right. You can speed up queries using indexes, just like a #database. 95% decreased latency on 10TB tpc-ds for low-moderate selectivity queries. You can create/drop indexes asynchronously. ✨ Logical partitioning via Expression Indexes - #postgres style expression indexes to treat partitions like the coarse-grained indexes they are. It avoids the most common pitfall with users creating tons of small partitions. 🤯 Partial Updates - 2.6x performance and 85% reduction in byte written dropping write/query costs on update-heavy workloads. Lays the foundation for multimodal and unstructured data ⚡ Non-blocking Concurrency Control (NBCC) enables simultaneous writing from multiple writers and compaction of the same record without blocking any involved processes. This is an industry first! 🎉 Merge Modes - First-class support for both styles of stream data processing: commit_time_ordering, event_time_ordering, and custom record merger APIs. 🦾 LSM timeline—Hudi has a revamped timeline that stores all action history on a table as a scalable LSM tree, allowing users to retain a large amount of table history. ⌛ TrueTime - Hudi strengthens TrueTime semantics. The default implementation assures forward-moving clocks even with distributed processes, assuming a maximum tolerable clock skew similar to OLTP/NoSQL stores So, if you love open-source innovation as much as we do, check out the release and join our ~12000 strong community across Slack & GitHub. We're a grassroots OSS community that has sustained innovation in a fiercely competitive commercial data ecosystem. #apachehudi #datalakehouse #opentableformat #dataengineering #apachespark #apacheflink #trinodb #awss3 #distributedsystems #analytics #bigdata #datalake

2,064

Apache Hudi

Apache Hudi

@apachehudi

17 Dec 2024

Hudi 1.0 is the most powerful release to date for data lakehouses. Read the blog for details: Secondary Indexing, Expression Indexes, Partial Updates, Non-blocking Concurrency Control, New LSM timeline, more: hudi.apache.org/blog/2024/12… #datalakehouse #opentableformat

Announcing Apache Hudi 1.0 and the Next Generation of Data Lakehouses | Apache Hudi

Overview

hudi.apache.org

2,711

Vinoth Chandar

Vinoth Chandar

@byte_array

12 Jul 2024

#OpenTableFormat is at peak of inflated expectations, while #datalakehouse is at trough of disillusionment? And here, we thought former enabled the latter. I can imagine anyone working hands-on with these has their heads 🤯 right now. There is some truth here: ⛰️Just an open table format buys you NOTHING. If we think so, then the picture is accurate. We’re kidding ourselves in a peak of inflated expectations 😩 If market noise around table format is plunging you into a trough - @apachehudi, @apachextable - here to help by focussing on productive things to help build your lakehouse. #Gartner #Datamanagement #data #bigdata

941

Masashi Hachida

Masashi Hachida @MasashiHachida

27 Jun 2024

#Teradata の最新情報を配信していますので、ご視聴ください。今月は、#機械学習での #ハイパーパラメータチューニングの自動化機能や、#TeradataAIUnlimited の利用方法や、#OTF #OpenTableFormat の具体的な利用方法をデモを交えてご説明しています。 youtu.be/WSzVDHVdpGE?si=w5ps… via @YouTube

Monthly Update 2024年6月号

Teradataの最新情報を毎月まとめてアップデート。【今月のラインナップ】・ClearScape Analytics ハイパー...

youtube.com

107

Sandesh Soni 🚀

Sandesh Soni 🚀 @IamSandeshSoni

12 Oct 2023

I started exploring OpenTable Formats. Can they be used with Elixirlang or Erlang? How do we build a lake and how to make meaningful data decisions . Please suggest blogs #myelixirstatus #datalake #opentableformat

780