Filter
Exclude
Time range
-
Near
Replying to @elonmusk
Hiring @EcZachly for Doge would have probably lead to nearly double the success rate especially knowing one of the main roadblocks were scattered datacatalogs and no unified data layer.
1
2
37
**🚀 Data for Breakfast: Day 25 - Data Catalogs, Your Data Directory! 📊📇** Yo, X fam! It’s *Data for Breakfast*, your daily data fire! 🔥 We’ve roasted data basics, collection, storage, processing, viz, security, ethics, analytics, big data, ML, privacy, governance, integration, quality, lakes, warehouses, pipelines, real-time analytics, democratization, storytelling, monetization, cloud computing, lakes vs. warehouses, and cleansing. Today, we’re organizing the chaos with **data catalogs**—the ultimate map to your data treasures! 😎 ### What’s a Data Catalog? A data catalog is like a smart library index for all your data assets. It lists, describes, and tags data sources so you can find what you need fast—no more digging through swamps (Day 15 vibes). Think of it as X’s search bar but for databases, files, and pipelines. 🔍 ### How Data Catalogs Work 1. **Inventory**: Scans and lists all data sources automatically. - *Ex*: X cataloging post data, user metrics, and ad stats in one spot. 📋 2. **Metadata**: Adds descriptions, tags, and lineage (where data came from). - *Ex*: Tagging X’s trending data with “real-time, high-velocity” labels. 🏷️ 3. **Search & Discovery**: Lets users query like Google—find data by keyword or type. - *Ex*: Searching X’s catalog for “#DataForBreakfast engagement metrics.” 🕵️ 4. **Collaboration**: Shares access with teams, tracking who uses what. - *Ex*: X analysts finding and sharing hashtag trend datasets. 🤝 5. **Governance Integration**: Ties in rules for quality, privacy, and compliance (Day 12). - *Ex*: Flagging sensitive X user data to keep it locked down. 🔒 ### Why Catalogs Slap Catalogs make data discoverable and usable—speeding up analytics, reducing duplicates, and boosting collaboration. Without ‘em? Data silos where insights get lost forever. 😬 They’re key for big orgs like X handling zettabytes of posts. ### Wild Stat Data catalogs can cut data search time by 70%—imagine X teams finding trend data in minutes, not hours! ⏱️ 🔥 **Your Take**: Ever hunted for lost data? In a work file? On X? Spill the frustration below! 👇 Tomorrow’s *Data for Breakfast: Day 26* dives into *data lineage*. Stay locked, X squad! 🧠 #DataForBreakfast #DataCatalogs #DataIsGold #DigitalWorld
3
149
🛬  We’ve upstreamed a good chunk of multi-catalog functionality from Onehouse to Apache XTable (Incubating) If you were wondering what the "X" meant, it meant "everything cross-table". XTable now operates beyond just the table format translation and helps across catalogs. 👉  Read the full blog here: dipankar-tnt.medium.com/intr… Many catalogs are working towards federation, where credential vending and policy enforcement are done across engines, at query time. Syncing from the writers to other catalogs has multiple advantages over this model. Both models can be used together to address some of these issues. 1️⃣  Query side federation from a single catalog, while helping streamline governance, still builds too much reliance on that one catalog vendor. E.g., all permissions are defined/stored/managed in that single catalog. 2️⃣  It’s a perpetually in-progress project. Engines and vendors must understand each other’s systems and keep up with new features. 3️⃣  Latency? Making more external API calls to other catalogs during query planning time. 4️⃣ Things like Hive Metastore API or Iceberg Rest catalogs can become common protocols, but you need an open, independent way to manage N catalog endpoints if you use N engines. We are operating on the first principles we established, right at the company's start. Refresher: onehouse.ai/blog/onehouse-co… It’s been a fascinating journey, to say the least, jumping across these different hurdles that stand in the way of truly unfettered data architecture. Looking forward eagerly to the next.. 👍 #ApacheXTable #DataEngineering #OpenLakehouse #ApacheHudi #ApacheIceberg #DeltaLake #CloudDataPlatforms #OpenData #MetadataManagement #DataArchitecture #Interoperability #DataLakehouse #DataLake #BigData #Data #OpenFormats #DataCatalogs
1
4
460
Two days from now! Oct 2. We're live with the second #OpenSourceDataSummit 2024, covering various topics from open-source leaders and data practitioners. 🕐 Every company wants to open up #data to compute engines without #lockin. Learn how to unbundle your data platform on the right foundations. 🗒 Suddenly, there is renewed attention on #metadata & #datacatalogs, the final frontier for actually opening up data. Join a well-rounded panel with folks from Acryl Data, #Databricks, Datastrato, Onehouse and #SnowflakeDB to hash it out. 📈 Apache Hudi has contributed several "firsts" for the #open #datalakehouse and the community continues to crank. Learn about all the cool stuff happening in Hudi 1.0. ⚖ In all the talk about lower-level open data formats, we could lose track of things that ultimately matter. Learn about balancing #batchprocessing and #streamprocessing 🚅 What does optimizing #datalake #infrastructure stack for blazing-fast queries take? Register free 👉 opensourcedatasummit.com/#re…

5
483
Ever wondered who benefits from data cataloging? Well, it's more than just the #CDOs, and it can do more than just helping users find, identify and classify datasets. We explain it all here: infa.media/4dnlK2e #datacatalogs
2
375
Join our Crash Course on #DataCatalogs on Dec 7 to uncover the key bottlenecks and pitfalls that are preventing organisations from achieving successful #data catalog from both technological and human & business perspectives. Sign up: brnw.ch/21wE72j

1
276
What is the difference between #datacatalogs and #dataproducts? In this blog, Ryo Komatsuzaki discusses the relationship between data catalogs and data products and how Starburst Gravity enables the management of all data assets connected to Galaxy: okt.to/6aSvVu

1
1
239
It seems we currently have two classes of #datacatalogs. (a) Component data catalogs. These exist only as components of a given data management environment and are oriented to natively handling the technical #metadata for that. Examples are #Purview and #UnityCatalog. (b) Standalone data catalogs. These exist as independent software applications and capable of managing a wide array of different metadata. They require “connectors” to get to technical metadata. Examples are the traditional data catalogs. It’s probably a good idea to distinguish between these two classes when thinking about tooling solutions or a metadata architecture. #datagovernance #datamanagement #metadata #analytics
1
1
2
214
This Thursday, I have the pleasure of presenting with Robert S. Seiner at this @Dataversity event on How to Govern Glossaries, Dictionaries, and Data Catalogs. I am glad @Alation is sponsoring, and I get to listen in on another enlightening webinar from Mr. Seiner! Please join us and register at the link below. See you on Thursday! dataversity.net/jul-20-rwdg-… #datagovernance #datacatalogs #dataintelligence

1
1
3
173
How many #datacatalogs are just flea markets of #metadata? They dredge up the good, bad, and ugly in terms of #schemas, #tables, #columns, and deposit them into bazaars for curious browsers who apparently have time to sift through this detritus in order to find something that might be of interest. Better to have proactively governed and managed #datasets. #dataproducts. #datagovernance #DataManagement
2
81
16 Jun 2023
.@Bratsas now telling us how important it is for open data to be #interoperable, and for #DataCatalogs to be #OpenSource (you might think that's obvious, but turns out it is not!)
1
1
1
123
Excited to announce the formation of FOUR new OGC Standards Working Groups! The launch of these new SWGs will increase interoperability across #AnalysisReadyData, #Agriculture, #DataCubes, and Geo #DataCatalogs bit.ly/3o5ocGt
3
5
1,555
Excited to announce the formation of FOUR new OGC Standards Working Groups! The launch of these new SWGs will increase interoperability across #AnalysisReadyData, #Agriculture, #DataCubes, and Geo #DataCatalogs bit.ly/3o5ocGt
1
2
687
Excited to announce the formation of FOUR new OGC Standards Working Groups! The launch of these new SWGs will increase interoperability across #AnalysisReadyData, #Agriculture, #DataCubes, and Geo #DataCatalogs bit.ly/3o5ocGt
1
5
1,310