Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RecordBatch::concat #537

Merged
merged 2 commits into from
Jul 14, 2021

Conversation

silathdiir
Copy link
Contributor

Which issue does this PR close?

Closes #461 .

Rationale for this change

As described in the issue, tries to implement RecordBatch::concat according to https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/coalesce_batches.rs#L232 .

What changes are included in this PR?

Adds a new function concat to struct RecordBatch, and test cases.

Are there any user-facing changes?

With this fix, a new RecordBatch could be created by concatenating multiple RecordBatches.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 11, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jul 11, 2021

Codecov Report

Attention: Patch coverage is 93.10345% with 4 lines in your changes missing coverage. Please review.

Project coverage is 82.62%. Comparing base (f1fb2b1) to head (3a14809).
Report is 2894 commits behind head on master.

Files with missing lines Patch % Lines
arrow/src/record_batch.rs 93.10% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #537      +/-   ##
==========================================
+ Coverage   82.60%   82.62%   +0.01%     
==========================================
  Files         167      167              
  Lines       45984    46042      +58     
==========================================
+ Hits        37984    38041      +57     
- Misses       8000     8001       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@silathdiir silathdiir force-pushed the implement-record-batch-concat branch from eec5c90 to 716e23c Compare July 11, 2021 05:07
@silathdiir silathdiir force-pushed the implement-record-batch-concat branch from 716e23c to f97b9ec Compare July 11, 2021 05:18
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @silathdiir !

I think the error handling should be more strict when the schemas don't match, but otherwise this looks great

It looks like @jorgecarleitao #461 (comment) has a suggestion to move this function to arrow::compute::concat::concat_batches rather than in record_batch.rs. What do you think?

@alamb
Copy link
Contributor

alamb commented Jul 11, 2021

Although @nevi-me seems to like RecordBatch::concat -- #461 (comment)

@silathdiir
Copy link
Contributor Author

silathdiir commented Jul 12, 2021

Hi @alamb, refer to @jorgecarleitao latest comment #461 (comment), it seems that arrow::compute::concat::concat_batches may be better (even seems not good enough for thread model).
I will follow the discussion and try to update this PR.

@silathdiir silathdiir force-pushed the implement-record-batch-concat branch from 9bf379b to 3a14809 Compare July 12, 2021 09:07
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR looks good to merge from my perspective, but I am not sure if we should move the concat function to arrow::compute . @jorgecarleitao nd @nevi-me what do you think we should do here?

@alamb
Copy link
Contributor

alamb commented Jul 12, 2021

thanks @silathdiir !

@alamb
Copy link
Contributor

alamb commented Jul 13, 2021

Unless I hear different, I plan to merge this PR tomorrow (and include it in 5.0.0)

@alamb alamb merged commit b05edf4 into apache:master Jul 14, 2021
@alamb
Copy link
Contributor

alamb commented Jul 14, 2021

Thanks again @silathdiir !

@silathdiir silathdiir deleted the implement-record-batch-concat branch July 14, 2021 11:56
@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement RecordBatch::concat
3 participants