[SPARK-17613] S3A base paths with no '/' at the end return empty DataFrames #15169
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Consider you have a bucket as
s3a://some-bucketand under it you have files:
Getting the parent path of
s3a://some-bucket/file1.parquetyieldss3a://some-bucket/and the ListingFileCatalog uses this as the key in the hash map.When catalog.allFiles is called, we use
s3a://some-bucket(no slash at the end) to get the list of files, and we're left with an empty list!This PR fixes this by adding a
/at the end of theURIiff the givenPathdoesn't have a parent, i.e. is the root. This is a no-op if the path already had a/at the end, and is handled through the Hadoop Path, path merging semantics.How was this patch tested?
Unit test in
FileCatalogSuite.