-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: enforce writing POSIX compatible paths for data location #6772
Conversation
@@ -33,4 +34,8 @@ public static String stripTrailingSlash(String path) { | |||
} | |||
return result; | |||
} | |||
|
|||
public static String posixNormalize(String path) { | |||
return Paths.get(path).normalize().toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is not suitable for Iceberg paths, since Java Path
is for paths, but Iceberg paths are actually URIs. It mangles URIs and doesn't understand the scheme vs path distinction:
jshell> Paths.get("s3://foo/bar").toString()
$2 ==> "s3:/foo/bar"
jshell> Paths.get("s3://foo/bar/../..").normalize().toString()
$5 ==> "s3:"
jshell> Paths.get("s3://foo/bar/../../..").normalize().toString()
$6 ==> ""
URI.normalize()
seems better, though it has interesting behavior where it preserves leading ..
path segments at the start of the path that would go past the root:
jshell> URI.create("s3://foo/abc/../../..").normalize()
$11 ==> s3://foo/../..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like using URI is still not the right way, because URI would fail with characters like space, which are supposed to be escaped with URL encoding like %20
. However, it is a valid posix path.
I think what should be done is that, for path with scheme, everything after that should use Paths.get(path).normalize().toString()
.
We have different places for creating file path names,
So I will restrict this PR to just data location, and do other ones later to avoid having too many changes in different places. |
public void testPosixNormalizePathsWithSchemeAndWithoutAuthority() { | ||
Assert.assertEquals( | ||
"Must work with the root directory representation", | ||
"file:///", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a place where Hadoop Path behavior differs, it resolves to file:/
, but that seems like an invalid resolution.
Also I am treating /
as a special case, both here and in a path without a scheme. For paths with a scheme and authority, we can use scheme://authority
as the root path without ambiguity. But file://
(just scheme) and
(empty path) do not mean root location, only file:///
and /
mean that.
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
part of fixes for #6758