parquet: Add an option to not parse the Page Index on each query

### Is your feature request related to a problem or challenge?

`CREATE TABLE` does not parse the Page Index, and `SELECT` does not cache it. This can make requests on large Parquet datasets take a significant time for queries that have a small number of results.

For example, with a simple `SELECT int_column, other_int_column WHERE int_column=123456` on a table with 184 billion rows (so about 9 million Page Index items, given the default 20k page size)

> output_rows=0, elapsed_compute=96ns, num_predicate_creation_errors=0, page_index_rows_filtered=0, predicate_evaluation_errors=0, row_groups_pruned_bloom_filter=21050, row_groups_matched_bloom_filter=0, file_open_errors=0, file_scan_errors=0, bytes_scanned=25023432248, row_groups_matched_statistics=21050, pushdown_rows_filtered=0, row_groups_pruned_statistics=173576, time_elapsed_scanning_total=16.763964ms, page_index_eval_time=3.153918ms, time_elapsed_scanning_until_data=16.745759ms, time_elapsed_processing=61.531313027s, **time_elapsed_opening=96.012649352s**, pushdown_eval_time=382ns

### Describe the solution you'd like

Parse it once and for all, either on `CREATE TABLE` or lazily as `SELECT` queries read the files. (Note that in the case of partitioned tables, not all files may be read by the first `SELECT`)

### Describe alternatives you've considered

https://github.com/apache/datafusion/blob/3b93cc952b889cec2364ad2490ae18ecddb3ca49/datafusion-examples/examples/advanced_parquet_index.rs

but it requires using the low-level API, and is not available through the SQL or Python interfaces.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

parquet: Add an option to not parse the Page Index on each query #12547

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

parquet: Add an option to not parse the Page Index on each query #12547

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions