Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: enforce writing POSIX compatible paths for data location #6772

Closed
wants to merge 6 commits into from

Conversation

jackye1995
Copy link
Contributor

@jackye1995 jackye1995 commented Feb 8, 2023

part of fixes for #6758

@github-actions github-actions bot added the core label Feb 8, 2023
@jackye1995
Copy link
Contributor Author

@@ -33,4 +34,8 @@ public static String stripTrailingSlash(String path) {
}
return result;
}

public static String posixNormalize(String path) {
return Paths.get(path).normalize().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not suitable for Iceberg paths, since Java Path is for paths, but Iceberg paths are actually URIs. It mangles URIs and doesn't understand the scheme vs path distinction:

jshell> Paths.get("s3://foo/bar").toString()
$2 ==> "s3:/foo/bar"

jshell> Paths.get("s3://foo/bar/../..").normalize().toString()
$5 ==> "s3:"

jshell> Paths.get("s3://foo/bar/../../..").normalize().toString()
$6 ==> ""

URI.normalize() seems better, though it has interesting behavior where it preserves leading .. path segments at the start of the path that would go past the root:

jshell> URI.create("s3://foo/abc/../../..").normalize()
$11 ==> s3://foo/../..

Copy link
Contributor Author

@jackye1995 jackye1995 Feb 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like using URI is still not the right way, because URI would fail with characters like space, which are supposed to be escaped with URL encoding like %20. However, it is a valid posix path.

I think what should be done is that, for path with scheme, everything after that should use Paths.get(path).normalize().toString().

@jackye1995 jackye1995 changed the title Core: enforce writing POSIX compatible paths Core: enforce writing POSIX compatible paths for data location Feb 11, 2023
@jackye1995 jackye1995 requested review from electrum and RussellSpitzer and removed request for electrum February 11, 2023 20:01
@jackye1995
Copy link
Contributor Author

@jackye1995
Copy link
Contributor Author

We have different places for creating file path names,

  • LocationProvider for data files
  • TableOperations.metadataFileLocation for manifests and manifest lists
  • catalog specific impl for metadata file location (e.g. BaseMetastoreTableOperations.newTableMetadataFilePath)

So I will restrict this PR to just data location, and do other ones later to avoid having too many changes in different places.

public void testPosixNormalizePathsWithSchemeAndWithoutAuthority() {
Assert.assertEquals(
"Must work with the root directory representation",
"file:///",
Copy link
Contributor Author

@jackye1995 jackye1995 Feb 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a place where Hadoop Path behavior differs, it resolves to file:/, but that seems like an invalid resolution.

Also I am treating / as a special case, both here and in a path without a scheme. For paths with a scheme and authority, we can use scheme://authority as the root path without ambiguity. But file:// (just scheme) and (empty path) do not mean root location, only file:/// and / mean that.

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 25, 2024
Copy link

github-actions bot commented Sep 1, 2024

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants