Ever wondered how an engine actually reads an Iceberg table?
Iceberg read path in one line:
Catalog ā Metadata ā Manifest list ā Manifest files ā Data files
Apache Iceberg Read Path (Engine ā Table)
When an engine reads an Iceberg table, it walks this chain from top to bottom:
1) Catalog
The starting point.
Stores a pointer to the tableās current metadata file, which represents the latest snapshot reference.
2) Metadata File
Defines the table schema, lists snapshots, and references the manifest list for the snapshot being read.
3) Manifest List
Tracks all manifest files associated with the selected snapshot.
4) Manifest Files
Contain metadata about data files, including partition values and file-level statistics, which help determine which files should be read.
5) Data Files
The actual table data is stored in object storage. This is what the engine ultimately reads.
Why this matters
During reads, Iceberg resolves the snapshot through the catalog and metadata layers, then uses manifest metadata to identify the exact set of data files for that snapshot.