Skip to content

[AURON #2175] Add native support for the _file metadata column#2184

Draft
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/metadata_columns_support_native_iceberg
Draft

[AURON #2175] Add native support for the _file metadata column#2184
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/metadata_columns_support_native_iceberg

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2175

Rationale for this change

This PR adds native support for Iceberg metadata columns in Auron, starting with _file.

Previously, Iceberg scans fell back whenever metadata columns were projected. With this change, queries that read _file can remain on the native Iceberg scan path.
Iceberg metadata columns are useful in real workloads for debugging, lineage, and inspection queries. However, Auron previously treated metadata columns as unsupported and fell back to Spark.

This PR improves native Iceberg scan coverage by supporting metadata columns that can be represented as file-level constant values, while still falling back for unsupported row-level metadata columns.

What changes are included in this PR?

This PR:

  • adds native support for the Iceberg _file metadata column
  • keeps unsupported metadata columns such as _pos on the fallback path
  • extends IcebergScanPlan to distinguish between:
    • file-backed data columns
    • metadata columns materialized outside the file payload
  • updates IcebergScanSupport to stop rejecting all metadata columns unconditionally
  • passes supported metadata values through the native Iceberg scan path as per-file constant values
  • updates NativeIcebergTableScanExec to project both normal data columns and supported metadata columns
  • adds integration tests in AuronIcebergIntegrationSuite

Scope of support in this PR

This PR intentionally takes a conservative approach.

Supported in native scan:

  • _file

Still falls back:

  • _pos
  • other unsupported metadata columns that require row-level metadata handling

Why this design?

_file is a file-level metadata column: every row coming from the same file shares the same value. That makes it a good fit for the existing native file-scan path by treating it as a per-file constant column.

In contrast, _pos is row-level metadata and cannot be represented correctly with the same mechanism, so it remains unsupported in native execution for now.

How was this patch tested?

CI.

…a column

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
@weimingdiit weimingdiit changed the title [AURON #2175][iceberg] Add native support for the _file metadata column [AURON #2175] Add native support for the _file metadata column Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for Iceberg _file metadata column

1 participant