-
Notifications
You must be signed in to change notification settings - Fork 332
[PAUSED - DO NOT REVIEW] Allow for quotes in path for blob storage #1586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,12 +20,15 @@ | |
|
|
||
| import jakarta.annotation.Nonnull; | ||
| import java.net.URI; | ||
| import java.util.Locale; | ||
| import java.util.regex.Pattern; | ||
| import org.apache.polaris.core.storage.azure.AzureLocation; | ||
|
|
||
| /** An abstraction over a storage location */ | ||
| public class StorageLocation { | ||
| private final String location; | ||
| public static final String LOCAL_PATH_PREFIX = "file:///"; | ||
| public static final Pattern SLASHES = Pattern.compile("/+"); | ||
|
|
||
| /** Create a StorageLocation from a String path */ | ||
| public static StorageLocation of(String location) { | ||
|
|
@@ -40,12 +43,21 @@ public static StorageLocation of(String location) { | |
| protected StorageLocation(@Nonnull String location) { | ||
| if (location == null) { | ||
| this.location = null; | ||
| } else if (location.startsWith("file:/") && !location.startsWith(LOCAL_PATH_PREFIX)) { | ||
| this.location = URI.create(location.replaceFirst("file:/+", LOCAL_PATH_PREFIX)).toString(); | ||
| } else if (location.startsWith("/")) { | ||
| this.location = URI.create(location.replaceFirst("/+", LOCAL_PATH_PREFIX)).toString(); | ||
| return; | ||
| } | ||
| String scheme = scheme(location); | ||
| String path; | ||
| if (scheme == null) { | ||
| path = location; | ||
| scheme = "file"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't have an enum or something for this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately not, we are parsing them from the user-provided input. I tried hard-typing these and rejecting any schemes that are not explicitly supported (but including file formats that don't explicitly state a scheme) and then a barrage of test cases started failing due to weird input we've put in - some of them intentionally, others not. So I'm not really in favor of making such a massive shift, it will dramatically increase the scope of this change. |
||
| } else { | ||
| path = location.substring(scheme.length() + 1); | ||
| } | ||
| path = normalizePath(path); | ||
| if (scheme.equals("file")) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto, let's at least put this in a const
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Roger that, that's reasonable. Added in next revision. |
||
| this.location = URI.create(LOCAL_PATH_PREFIX + path).toString(); | ||
| } else { | ||
| this.location = URI.create(location).toString(); | ||
| this.location = scheme + "://" + path; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't this duplicate the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No - the scheme will be without the "://", as we parse right up to the ":" for it. The path is normalized in the |
||
| } | ||
| } | ||
|
|
||
|
|
@@ -99,4 +111,23 @@ public boolean isChildOf(StorageLocation potentialParent) { | |
| return slashTerminatedLocation.startsWith(slashTerminatedParentLocation); | ||
| } | ||
| } | ||
|
|
||
| private static String scheme(String location) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: method name
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did this get missed?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry - seems like things didn't push :( Will push with the next set of comments. |
||
| int schemePos = location.indexOf(':'); | ||
| if (schemePos > 0) { | ||
| return location.substring(0, schemePos).toLowerCase(Locale.ROOT); | ||
eric-maynard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
| return null; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we throw here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No - this is passed early in the making of the |
||
| } | ||
|
|
||
| protected String normalizePath(String path) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't really normalization, maybe it's canonicalization?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like both terms are synonyms - at least I'm not seeing a difference between these two terms. Can you please explain the difference or provide a link to something that does? For reference, this is what Google is pulling up on this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Canonicalization usually means adding all the missing parts, such as the scheme and authority, whereas normalization / resolution usually just means making a path absolute. |
||
| path = SLASHES.matcher(path).replaceAll("/"); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is always correct. IIRC on Azure
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this is just a special case for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh yes, great catch. Can't believe I forgot about this but you're correct, S3 is the same as well. I will make changes to make sure this is only happening for local files then. |
||
| if (path.endsWith("/") || path.startsWith("/")) { | ||
| path = | ||
| path.substring( | ||
| path.startsWith("/") && path.length() > 1 ? 1 : 0, | ||
| path.endsWith("/") ? path.length() - 1 : path.length()); | ||
| } | ||
|
Comment on lines
+125
to
+130
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| return path; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,17 +19,41 @@ | |
| package org.apache.polaris.service.storage; | ||
|
|
||
| import org.apache.polaris.core.storage.StorageLocation; | ||
| import org.assertj.core.api.Assertions; | ||
| import org.junit.jupiter.api.Assertions; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Earlier, I was testing some assertionThrows and JUnit was much easier to work with for those imo. But now that those are gone, I agree for these cases - I'll change them back. |
||
| import org.junit.jupiter.api.Test; | ||
|
|
||
| public class StorageLocationTest { | ||
|
|
||
| @Test | ||
| public void testOfDifferentPrefixes() { | ||
| StorageLocation StandardLocation = StorageLocation.of("file:///path/to/file"); | ||
| public void testAwsLocations() { | ||
| StorageLocation baseStorageLocation = StorageLocation.of("s3://path/to/file"); | ||
| StorageLocation testStorageLocation = StorageLocation.of("s3://path/to/////file"); | ||
| Assertions.assertEquals(testStorageLocation, baseStorageLocation); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test is good, but I believe more test cases are needed to validate the handling of upper / lower case letters and leading / trailing slashes and the special cases of missing scheme.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call out - let me add them. Missing scheme is in the function below with this variable: |
||
| testStorageLocation = StorageLocation.of("s3://////path/////to///file"); | ||
| Assertions.assertEquals(testStorageLocation, baseStorageLocation); | ||
|
|
||
| testStorageLocation = StorageLocation.of("s3://////path'/////*to===///\"file"); | ||
| Assertions.assertEquals("s3://path'/*to===/\"file", testStorageLocation.toString()); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: in this URI |
||
| } | ||
|
|
||
| @Test | ||
| public void testGcpLocations() { | ||
| StorageLocation baseStorageLocation = StorageLocation.of("gcs://path/to/file"); | ||
| StorageLocation testStorageLocation = StorageLocation.of("gcs://path/to/////file"); | ||
| Assertions.assertEquals(testStorageLocation, baseStorageLocation); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this test materially different from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In spirit, no. Let me merge this case and the AWS cases. |
||
| testStorageLocation = StorageLocation.of("gcs://////path/////to///file"); | ||
| Assertions.assertEquals(testStorageLocation, baseStorageLocation); | ||
|
|
||
| testStorageLocation = StorageLocation.of("gcs://////path'/////*to===///\"file"); | ||
| Assertions.assertEquals("gcs://path'/*to===/\"file", testStorageLocation.toString()); | ||
|
Comment on lines
+47
to
+48
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See my comment above -- is this actually correct?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does it work right now in Polaris?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not correct. Actually when looking through Java URI code - we are not currently applying any normalization to URLs... And I'm not sure that normalizing URLs is entirely necessary either. We've worked through the comments above that double slashes shouldn't be modified and similarly, "." and ".." also shouldn't be normalized for blob storage. As a result, I think this might be trying to do too much. I'm going to stop work on this and push a new branch that simply just takes the URI create off for the Blob Storage locations and adds additional tests. Will link that PR here when I push it. |
||
| } | ||
|
|
||
| @Test | ||
| public void testLocalFileLocations() { | ||
| StorageLocation storageLocation = StorageLocation.of("file:///path/to/file"); | ||
| StorageLocation slashLeadingLocation = StorageLocation.of("/path/to/file"); | ||
| StorageLocation fileSingleSlashLocation = StorageLocation.of("file:/path/to/file"); | ||
| Assertions.assertThat(slashLeadingLocation.equals(StandardLocation)).isTrue(); | ||
| Assertions.assertThat(fileSingleSlashLocation.equals(StandardLocation)).isTrue(); | ||
| Assertions.assertEquals(slashLeadingLocation, storageLocation); | ||
| Assertions.assertEquals(fileSingleSlashLocation, storageLocation); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note, we need tests for the new case being supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good call out. Included in the latest commit!