-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partitioning fixes #9207
Partitioning fixes #9207
Conversation
@@ -234,7 +239,7 @@ impl ListingTableUrl { | |||
} | |||
|
|||
/// List all files identified by this [`ListingTableUrl`] for the provided `file_extension` | |||
pub(crate) async fn list_all_files<'a>( | |||
pub async fn list_all_files<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it related to the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to separate this out if preferred. I made this change because it allows easier debugging for future users who are unsure which files the ListingTableUrl
will actually list. This means one can see whether the issue is with the url itself, or another issue
Sorry I didn't make it to this PR today -- I plan to review it tomorrow if no one beats me to it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @esheppa -- this change makes sense to me. I think it is well documented and well tested. Nice work.
I had one small comment suggestion but I don't think it is required and we can merge this PR without it.
Thanks again
}; | ||
|
||
// remove any segments that contain `=` as they are allowed even | ||
// when ignore subdirectories is `true` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// when ignore subdirectories is `true` | |
// when ignore subdirectories is `true`. | |
// For example a directory like `"year=2024"` is a hive | |
// partition containing data for the year 2024. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, easy fix
} | ||
} | ||
|
||
/// Returns `true` if `path` refers to a collection of objects | ||
pub fn is_collection(&self) -> bool { | ||
self.url.as_str().ends_with('/') | ||
self.url.as_str().ends_with(DELIMITER) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated this to use the DELIMITER
as elsewhere
.next() | ||
.map_or(false, |file_name| glob.matches(file_name)) | ||
} else { | ||
let stripped = segments.join(DELIMITER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated this to use the DELIMITER
as elsewhere
let has_subdirectory = segments.count() > 1; | ||
!has_subdirectory | ||
} else { | ||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to change this to something like:
None => !ignore_subdirectory || segments.count() <= 1,
however perhaps that is making things less clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you have added the unit test coverage I think it would be fine to change. You could also do as a follow on PR if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent, I've added one more commit
Thanks again @esheppa |
Which issue does this PR close?
Closes #9206.
Rationale for this change
What changes are included in this PR?
ListingTableUrl::contains
method to skip hive partitions where they appear as a segment in the urllist_all_files
method public to allow users to more easily see what files are being included with theListingTableUrl
Are these changes tested?
Yes a new test is added to ensure that the changes continue to behave as expected
Are there any user-facing changes?
ListingTableUrl
ListingTableUrl
which previously were not included. This could cause query results to change