Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort file names in a directory #2730 #2735

Merged
merged 3 commits into from
Jun 16, 2022
Merged

Sort file names in a directory #2730 #2735

merged 3 commits into from
Jun 16, 2022

Conversation

yourenawo
Copy link
Contributor

@yourenawo yourenawo commented Jun 15, 2022

Sort file names in a directory
fixed: #2730

@yourenawo yourenawo changed the title fix #2730 Sort file names in a directory #2730 Jun 15, 2022
Copy link
Contributor Author

@yourenawo yourenawo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: #2730

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partitions are processed in parallel on multiple threads so DataFusion cannot provide any guarantee of ordering of results unless the query contains an ORDER BY clause.

However, sorting files by filename seems reasonable to me from a UX point of view so I am fine with this change.

@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2022

Codecov Report

Merging #2735 (13991fc) into master (ef9df29) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2735      +/-   ##
==========================================
+ Coverage   84.89%   84.92%   +0.03%     
==========================================
  Files         270      270              
  Lines       47817    47915      +98     
==========================================
+ Hits        40593    40693     +100     
+ Misses       7224     7222       -2     
Impacted Files Coverage Δ
datafusion/data-access/src/object_store/local.rs 89.03% <100.00%> (+1.91%) ⬆️
datafusion/core/src/physical_plan/metrics/value.rs 86.93% <0.00%> (-0.51%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 73.91% <0.00%> (ø)
datafusion/common/src/scalar.rs 74.94% <0.00%> (+0.11%) ⬆️
datafusion/optimizer/src/filter_push_down.rs 98.32% <0.00%> (+0.16%) ⬆️
datafusion/expr/src/expr_fn.rs 89.41% <0.00%> (+1.17%) ⬆️
datafusion/optimizer/src/utils.rs 34.21% <0.00%> (+1.41%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef9df29...13991fc. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test might be good to ensure this behavior doesn't revert in the future

@yourenawo there appears to be a CI failure https://github.com/apache/arrow-datafusion/runs/6902662999?check_suite_focus=true due to not running cargo fmt

@yourenawo yourenawo requested a review from alamb June 16, 2022 01:16
Added a test case for sorting directories.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yourenawo ! Looks great

@alamb alamb merged commit 0e416f0 into apache:master Jun 16, 2022
waynexia pushed a commit to waynexia/arrow-datafusion that referenced this pull request Jun 20, 2022
* Update local.rs

* Update local.rs

* Update local.rs

Added a test case for sorting directories.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants