Wasteful to pull all primary key columns in query of `append` mode #1302

Rachelint · 2023-11-08T07:21:38Z

Describe This Problem

We pull projected + pk columns when querying in both append and overwrite mode.
However pulling pk columns is totally unnecessary in append mode.

Proposal

Just pulling the projected columns in query of append mode.

The record batch reading steps will be divided to following threes:

Pulling the ArrowRecordBatch.
Converting it to FetchingRecordBatch, somethings like filling not exist column a null/default value will be done here.
FetchingRecordBatch will be used in ChainIterator/ MergeIterator, and it may include not only the projected columns but the primary key columns for dedupping or else.
Prune to RecordBatch, as saying above, FetchingRecordBatch can include not only projected columns, prune the non-projecteds here.

Main changes:

Don't pass the ProjectedSchema including too many informations to where building the inner RecordBatchStream in ChainIterator/ MergeIterator. Instead, refactor the RowProjector to just include the needed informations and pass it to(mainly ScanRequest and SstReadOptions).
Refactor RecordBatchWithKey to FetchingRecordBatch, the main difference is that FetchingRecordBatch can include primary_keys_indexes or not. Actually, FetchingRecordBatch should not include primary_keys_indexes anymore, but it is hard to remove it completely, so maybe we can delay it to later prs.

Additional Context

No response

The text was updated successfully, but these errors were encountered:

Rachelint added the feature New feature or request label Nov 8, 2023

Rachelint changed the title ~~Useless to pull all primary key columns in append mode~~ Useless to pull all primary key columns in query of append mode Nov 8, 2023

Rachelint changed the title ~~Useless to pull all primary key columns in query of append mode~~ Wasteful to pull all primary key columns in query of append mode Nov 16, 2023

Rachelint mentioned this issue Nov 17, 2023

feat: avoid pulling unnecessary columns when querying append mode table #1307

Merged

jiacai2050 closed this as completed in 4abc764 Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasteful to pull all primary key columns in query of `append` mode #1302

Wasteful to pull all primary key columns in query of `append` mode #1302

Rachelint commented Nov 8, 2023 •

edited

Loading

Wasteful to pull all primary key columns in query of append mode #1302

Wasteful to pull all primary key columns in query of append mode #1302

Comments

Rachelint commented Nov 8, 2023 • edited Loading

Describe This Problem

Proposal

Additional Context

Wasteful to pull all primary key columns in query of `append` mode #1302

Wasteful to pull all primary key columns in query of `append` mode #1302

Rachelint commented Nov 8, 2023 •

edited

Loading