Skip to content

Conversation

@Kuinox
Copy link

@Kuinox Kuinox commented Nov 25, 2025

Rationale for this change

pq.read_schema drops extension types (UUID comes back as fixed_size_binary[16]), while ParquetFile.schema_arrow and read_table preserve them. Schema inspection via metadata should match table/extension behavior.

What changes are included in this PR?

  • Plumb arrow_extensions_enabled into read_schema and return schema_arrow when enabled so extension types are preserved.
  • Add regression test ensuring UUID extension types are retained by read_schema and downgraded to binary(16) when extensions are disabled.

Are these changes tested?

  • Yes: added unit test test_read_schema_uuid_extension_type

Are there any user-facing changes?

  • Behavior improvement: read_schema now preserves extension types (e.g., UUID) when extensions are enabled; no API break

Notes:

  • I don't know if the fact the column types being returned are now extension<arrow.uuid> instead of fixed_size_binary[16], is considered a breaking change.
  • This PR patch was AI generated, but I personally reviewed it, the scope is small, and it looks fine to me.

@github-actions
Copy link

⚠️ GitHub issue #48254 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant