Avoid reading entire stream to determine schema of arrow file #6368

jonmmease · 2023-05-16T22:59:17Z

Follow on to #6337.

Currently when reading an arrow file from a stream, the entire stream is parsed as a file in order to determine the schema:

This will result in parsing the stream multiple times (once to determine the schema and again later to actually build RecordBatches from the stream).

Can we be more efficient here by only looking as far into the stream as necessary to read the schema?

jonmmease added the enhancement New feature or request label May 16, 2023

jonmmease mentioned this issue May 16, 2023

Add support for reading Arrow files #6337

Merged

alamb added the performance Make DataFusion faster label May 17, 2023

Jefffrey mentioned this issue Oct 28, 2023

Read only enough bytes to infer Arrow IPC file schema via stream #7962

Merged

alamb closed this as completed in #7962 Nov 2, 2023

Provide feedback