Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-1781: col encoding mapping & stats accumulator #2159

Merged
merged 2 commits into from
Oct 11, 2021

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Oct 9, 2021

I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/

Summary

fixed issues:

NOTE

  • transient modification of data_blcok.rs has been reverted
  • "no side effects"
    crates / components other than dal, query/datasource/table/fuse are not touched, except that lz4 in enabled for parquet2

Changelog

  • Improvement
  • Not for changelog (changelog entry is not required)

Related Issues

Fixes #1781
Fixes #2193
Fixes #2194

Test Plan

Unit Tests

Stateless Tests

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2021

Codecov Report

Merging #2159 (93e72b4) into master (fbc7ae7) will increase coverage by 0%.
The diff coverage is 51%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #2159    +/-   ##
=======================================
  Coverage      67%     68%            
=======================================
  Files         637     643     +6     
  Lines       35849   35967   +118     
=======================================
+ Hits        24361   24599   +238     
+ Misses      11488   11368   -120     
Impacted Files Coverage Δ
common/dal/src/data_accessor.rs 0% <ø> (ø)
common/dal/src/impls/aws_s3/s3.rs 0% <ø> (ø)
...uery/src/datasources/table/fuse/io/block_reader.rs 0% <0%> (ø)
query/src/datasources/table/fuse/io/mod.rs 0% <ø> (ø)
query/src/datasources/table/fuse/table.rs 0% <0%> (ø)
...uery/src/datasources/table/fuse/table_do_append.rs 0% <0%> (ø)
query/src/datasources/table/fuse/table_do_read.rs 0% <0%> (ø)
...y/src/datasources/table/fuse/table_do_read_plan.rs 0% <0%> (ø)
...ry/src/datasources/table/fuse/table_do_truncate.rs 0% <0%> (ø)
...y/src/datasources/table/fuse/util/index_helpers.rs 0% <ø> (ø)
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fbc7ae7...93e72b4. Read the comment docs.

@@ -153,6 +153,20 @@ impl TryFrom<DataBlock> for RecordBatch {
}
}

impl TryFrom<&DataBlock> for RecordBatch {
Copy link
Member

@BohuTANG BohuTANG Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For datablock, we convert it to RecordBatch, 'TryInto' is better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worrying about the cost of conversion here?

seems the cost is the same whether converted from DataBlock or a reference of DataBlock. but extra pointers may still reference the underlying data though.

currently, some code needs to access DataBlock again after the conversion, which is being refactored.

later, this modification will be reverted.

impl TryFrom<&DataBlock> for RecordBatch {
type Error = ErrorCode;

fn try_from(v: &DataBlock) -> Result<RecordBatch> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can use TryFrom::try_from(&v) for impl TryFrom<DataBlock> for RecordBatch to reduce to code size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.

I am changing the caller side code so that Datablock will no longer need to be accessed after RecordBatch::try_from.

this is just a tmp modification of data_block.rs, to testify some statistic component, will not be submit in the final PR

@dantengsky dantengsky force-pushed the fix-1781 branch 2 times, most recently from 63a7b9a to bef13d4 Compare October 11, 2021 04:57
@dantengsky dantengsky changed the title ISSUE-1781: col encoding ISSUE-1781: col encoding mapping & stats accumulator Oct 11, 2021
@dantengsky dantengsky marked this pull request as ready for review October 11, 2021 08:13
@dantengsky dantengsky requested a review from BohuTANG October 11, 2021 08:37
@BohuTANG
Copy link
Member

/lgtm

Great~

@databend-bot
Copy link
Member

Approved! Thank you for the PR @dantengsky

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot databend-bot merged commit 27be03e into databendlabs:master Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants