-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10998: [C++] Detect URIs where a filesystem path is expected #11997
Conversation
An occasional misunderstanding is to pass a URI to filesystem methods, where a regular path is expected. Make these situations easier to diagnose by raising a specific error.
@coryan Would you like to take a look at the GCS changes here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a minor nit in the tests.
@@ -797,6 +838,11 @@ TEST_F(GcsIntegrationTest, OpenInputStreamInfoInvalid) { | |||
ASSERT_RAISES(IOError, fs->OpenInputStream(info)); | |||
} | |||
|
|||
TEST_F(GcsIntegrationTest, OpenInputStreamUri) { | |||
auto fs = internal::MakeGcsFileSystemForTest(TestGcsOptions()); | |||
ASSERT_RAISES(Invalid, fs->OpenInputStream("gcs:" + PreexistingObjectPath())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is just for testing purposes, but the conventional prefix for GCS is gs://
, at least that is what gsutil
uses:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you.
By the way, it seems other packages have adopted the gcs://
convention: https://gcsfs.readthedocs.io/en/latest/#integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh... My guess (and let me stress that this would be just a guess) is that gs://
is more familiar to GCS users. I can live with gcs://
too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it looks like they accept gs://
too: https://github.com/fsspec/gcsfs/blob/main/gcsfs/core.py#L1171
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a reasonable change to me. Maybe drop a note that colons are not supported in filenames here (https://github.com/apache/arrow/blob/master/docs/source/cpp/io.rst#filesystems) where we mention .
and ..
are not supported?
cpp/src/arrow/filesystem/s3fs.cc
Outdated
@@ -402,6 +402,10 @@ struct S3Path { | |||
std::vector<std::string> key_parts; | |||
|
|||
static Result<S3Path> FromString(const std::string& s) { | |||
if (internal::IsLikelyUri(s)) { | |||
return Status::Invalid( | |||
"Expected a S3 object path of the form 'bucket/key...', got a URI: '", s, "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Expected a S3 object path of the form 'bucket/key...', got a URI: '", s, "'"); | |
"Expected an S3 object path of the form 'bucket/key...', got a URI: '", s, "'"); |
Hmm, they should be more or less supported except in the first segment of a path. |
Benchmark runs are scheduled for baseline = cfcce5a and contender = 238b363. 238b363 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
An occasional misunderstanding is to pass a URI to filesystem methods, where a regular path is expected. Make these situations easier to diagnose by raising a specific error. Closes apache#11997 from pitrou/ARROW-10998-uri-path-mismatch Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Cool, this is a nice usability improvement! |
An occasional misunderstanding is to pass a URI to filesystem methods, where a regular path is expected.
Make these situations easier to diagnose by raising a specific error.