Skip to content

Conversation

@brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Sep 20, 2016

What changes were proposed in this pull request?

Consider you have a bucket as s3a://some-bucket
and under it you have files:

s3a://some-bucket/file1.parquet
s3a://some-bucket/file2.parquet

Getting the parent path of s3a://some-bucket/file1.parquet yields
s3a://some-bucket/ and the ListingFileCatalog uses this as the key in the hash map.

When catalog.allFiles is called, we use s3a://some-bucket (no slash at the end) to get the list of files, and we're left with an empty list!

This PR fixes this by adding a / at the end of the URI iff the given Path doesn't have a parent, i.e. is the root. This is a no-op if the path already had a / at the end, and is handled through the Hadoop Path, path merging semantics.

How was this patch tested?

Unit test in FileCatalogSuite.

@brkyvz
Copy link
Contributor Author

brkyvz commented Sep 20, 2016

cc @tdas This is the fix for the issue you were facing.

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65688 has finished for PR 15169 at commit 97a85d9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 20, 2016

Test build #65683 has finished for PR 15169 at commit e168a63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65689 has finished for PR 15169 at commit f5a9c12.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 21, 2016

Test build #65731 has finished for PR 15169 at commit ef781ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

LGTM. I walked through the code in IntelliJ and have verified that the added unit test fails without this change. It's weird that s3a would return a non-absolute path when producing the qualified path for the bucket root, but this fix seems correct to me.

@asfgit asfgit closed this in 85d609c Sep 22, 2016
asfgit pushed a commit that referenced this pull request Sep 22, 2016
…Frames

Consider you have a bucket as `s3a://some-bucket`
and under it you have files:
```
s3a://some-bucket/file1.parquet
s3a://some-bucket/file2.parquet
```
Getting the parent path of `s3a://some-bucket/file1.parquet` yields
`s3a://some-bucket/` and the ListingFileCatalog uses this as the key in the hash map.

When catalog.allFiles is called, we use `s3a://some-bucket` (no slash at the end) to get the list of files, and we're left with an empty list!

This PR fixes this by adding a `/` at the end of the `URI` iff the given `Path` doesn't have a parent, i.e. is the root. This is a no-op if the path already had a `/` at the end, and is handled through the Hadoop Path, path merging semantics.

Unit test in `FileCatalogSuite`.

Author: Burak Yavuz <brkyvz@gmail.com>

Closes #15169 from brkyvz/SPARK-17613.

(cherry picked from commit 85d609c)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
@brkyvz brkyvz deleted the SPARK-17613 branch February 3, 2019 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants