Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate JSON Reader options and DecoderOptions #1539

Merged
merged 1 commit into from
Apr 13, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 11, 2022

Which issue does this PR close?

Draft as it builds on #1451 from @sum12 .

Closes #1538

Rationale for this change

@sum12 started refactoring options into a common struct in #1451 and I wanted to finish that work to minimize API churn when we release the next arrow

What changes are included in this PR?

  1. Move batch_size to DecoderOptions as well.
  2. Consolidate field handling in ReaderBuilder

Are there any user-facing changes?

Yes -- if they use the low level Json decoder, the options are now encoded in a struct rather than as parameters

@alamb alamb added the api-change Changes to the arrow API label Apr 11, 2022
@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 11, 2022
@alamb alamb force-pushed the alamb/batch_size_too branch from 6e5ea8f to 23c5e4d Compare April 11, 2022 21:47
Self {
schema: None,
max_records: None,
batch_size: 1024,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this default is moved into DecoderOptions

@alamb alamb force-pushed the alamb/batch_size_too branch from 23c5e4d to ac9c647 Compare April 12, 2022 13:40
@alamb alamb requested review from houqp and nevi-me April 12, 2022 13:40
@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2022

I would like to merge this prior to creating the arrow 12 release candidate later this week (so that we minimize the API churn) -- @nevi-me and @houqp do you perhaps have a moment to review this?

cc @sum12

@alamb alamb marked this pull request as ready for review April 12, 2022 13:41
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to work out why there is a separation between the Decoder and a Reader and therefore we end up with nested builders, but I think this makes sense. We want to allow people to construct a Decoder with arbitrary streams of serde_json::Value and convert them to RecordBatch, whereas Reader only supports files 👍

@alamb alamb merged commit 228ee36 into apache:master Apr 13, 2022
@alamb alamb deleted the alamb/batch_size_too branch April 13, 2022 12:51
@alamb
Copy link
Contributor Author

alamb commented Apr 13, 2022

It took me a while to work out why there is a separation between the Decoder and a Reader and therefore we end up with nested builders, but I think this makes sense. We want to allow people to construct a Decoder with arbitrary streams of serde_json::Value and convert them to RecordBatch, whereas Reader only supports files 👍

I tried to encode this insight into the docs here: #1559

@alamb alamb changed the title Consolidate JSON reader options Consolidate JSON Reader options and DecoderOptions Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate JSON reader options
2 participants