Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FileIO Can Create Non-Posix Paths #6758

Closed
RussellSpitzer opened this issue Feb 6, 2023 · 27 comments
Closed

S3FileIO Can Create Non-Posix Paths #6758

RussellSpitzer opened this issue Feb 6, 2023 · 27 comments

Comments

@RussellSpitzer
Copy link
Member

Apache Iceberg version

1.1.0 (latest release)

Query engine

None

Please describe the bug 🐞

An interesting thing we ran into:

Our FileIo API contains this method

/** Get a {@link OutputFile} instance to write bytes to the file at the given path. */
OutputFile newOutputFile(String path);

Which uses a "String" as the Path for creating a new file reference. Now in general this is not an issue but there are some edge cases here. For example, S3FileIO doesn't enforce posix rules when creating paths or directories (since neither of those really exist in S3.) This means we the two following locations are actually different:

  1. foo//bar/file
  2. foo/bar/file

But within posix systems these two should refer to the exact same thing. See https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_266

A pathname may optionally contain one or more trailing slashes. Multiple successive slashes are considered to be the same as one slash.

Which we can see holds true in Java's Path class.

scala> import java.nio.file.Paths
import java.nio.file.Paths

scala> Paths.get("/foo//bar/file").equals(Paths.get("/foo/bar/file"))
res29: Boolean = true

Or more importantly in Hadoop's Path Class

scala> val p = new Path("foo/bar//bazz")
p: org.apache.hadoop.fs.Path = foo/bar/bazz

scala> p.toUri
res38: java.net.URI = foo/bar/bazz

This leads to an issue when a table is written to by S3FileIO but then read with HadoopFileIO. HadoopFileIO cannot read from the special foo//bar/file path because this isn't a valid posix path. This means if for some reason we end up generating double slashes in our path's metadata_location (or other paths) when using S3FileIO those files will be inaccessible if S3FileIO is swapped with HadoopFileIO.

I think in this case we should probably add to the spec that all files (and paths) must comply with posix standards.

@RussellSpitzer
Copy link
Member Author

@amogh-jahagirdar + @jackye1995 Do you have any thoughts on this?

@jackye1995
Copy link
Contributor

jackye1995 commented Feb 6, 2023

So internally we have a fix, where we extend Hadoop Path to work with double slash instead of the posix standard, and use that Path object for Iceberg for the specific FileSystem implementation. For existing tables written with //, it seems like a case that has to continue to be supported in this way.

But I agree it would be good to enforce this across all FileIOs, and just make sure removing trailing slash is done whenever needed. I am not sure if it has to be in the spec, given it has been like this for a pretty long time.

@RussellSpitzer
Copy link
Member Author

return String.format("%s/%s/%s", metadata.location(), METADATA_FOLDER_NAME, filename);
I think it makes sense to do the remove trailing slash here. I'm not sure why we only do it for custom metadata locations

@RussellSpitzer
Copy link
Member Author

For the Spec I was just considering something like "path" a path is a POSIX normalized path

@amogh-jahagirdar
Copy link
Contributor

amogh-jahagirdar commented Feb 6, 2023

Mostly agree with @jackye1995 here. I think at minimum, we for sure should remove trailing slash when writing any file that's part of the Iceberg table.

As for the spec change, I think it's good to define at the spec level. Even though our systems will need to enforce backwards compatibility for the files written with double slash, at least for future Iceberg releases we can establish the guarantee.

Standardizing file paths reduces this complexity of having to understand nuances of each FileIO implementation. Iceberg users would be confident just from the spec that all paths written will be a normalized posix path.

@RussellSpitzer
Copy link
Member Author

This is also an issue for data files here

String dataLocation = properties.get(TableProperties.WRITE_DATA_LOCATION);
if (dataLocation == null) {
dataLocation = properties.get(TableProperties.OBJECT_STORE_PATH);
if (dataLocation == null) {
dataLocation = properties.get(TableProperties.WRITE_FOLDER_STORAGE_LOCATION);
if (dataLocation == null) {
dataLocation = String.format("%s/data", tableLocation);
}
}
}

@RussellSpitzer
Copy link
Member Author

@findepi Is this an issue with Trino's file IO rules? Or does it always require all paths be Posix compliant? I know you have special S3 IO code as well

@findepi
Copy link
Member

findepi commented Feb 7, 2023

@RussellSpitzer thanks for the ping, but honestly I do not know.
I guess by non-POSIX you mean handling of double-slashes. For that @electrum @ebyhr are experts and I think we differentiate between paths containing repeated slashes.

@RussellSpitzer what else I should look at?

@RussellSpitzer
Copy link
Member Author

That's basically the only issue, I think you may be safe if you never use a Hadoop File System implementation but it can potentially make an Iceberg table with paths that another framework like Spark or Flink could not access. So our big thought here is should we force all Iceberg "paths" to be posix normalized (no double slashes, no relative path elements, etc ...)

Another example would be setting the table location like foo/bar/../baz
In our HadoopFileIo this would be equivelent to foo/baz
But S3FileIO treats it like a string an would probably make foo/bar/../baz with a .. directory

@jackye1995
Copy link
Contributor

If we add this enforcement and still want backwards compatibility, it seems like we will need to do that as a feature flag.

And this cannot just be an engine level feature flag, because such enforcement will lead to some tables having hybrid file paths, and some are stored in posix style and some are stored literally.

So it seems necessary to:

  1. introduce this flag in all the places that stores a file path (catalog, table metadata, manifest list, manifest file), so the reader understands how to parse the file paths differently.
  2. add an engine level config to turn this feature on for subsequent writes.
  3. add some migration procedure to fix tables with broken file paths.

Would you agree with the actions above?

@RussellSpitzer
Copy link
Member Author

I think (2) we can actually do immediately without losing backwards compatibility. All readers can read paths written in the posix compliant matter, so implementing 2 even without a feature flag should be valid. The only issue is that new files will be in a slightly different location in S3 than they would have been previously. So I don't think this even needs a flag.

IE If we say from this point on, all files that are created will have posix normalized paths, should have no effect on compatibility. So I don't think we need (1). We let everyone read paths as they are in the metadata, we just make sure they are all written in the metadata in a way that is universally compatible.

Then I do think we may need a (3) but this is just for tables which have been written with non-posix compatible FileIO's that need to be used with a posix compatible file IO. This is a bit more intensive.

@jackye1995
Copy link
Contributor

I see, I thought you were saying the file paths will still be stored in the original way like s3://foo/bar//baz, but will store data at s3://foo/bar/baz. Reader also sees s3://foo/bar//baz in the manifest, but will read the file at s3://foo/bar/baz.

So instead, you actually mean we will never even store non-posix paths in Iceberg metadata going forward?

@RussellSpitzer
Copy link
Member Author

Yeah my plan would be, regardless of what string ends up at the FileIO. The FileIO is responsible for only writing a posix compatible path. So just because I have a table location "foo/bar/" doesn't mean I get files at "foo/bar//data/". I'm open to other suggestions but this seems like a way we can resolve the issue.

For our users I'm trying to deal with the fact that they have many many tables and don't want to go through them modifying the table location in all of them. Ideally I think we should be able to go back and forth between FileIO's without causing a failure. This is a slightly unique situation because I'm not sure if there are any other FileIO's which allow access to the same underlying filesystem.

@jackye1995
Copy link
Contributor

This is a slightly unique situation because I'm not sure if there are any other FileIO's which allow access to the same underlying filesystem.

I think this is probably a problem for all the S3-compatible storage systems, we already have a few implemented in this repo like the one from Dell ECS, so agree we should solve compatibility of it.

Yeah my plan would be, regardless of what string ends up at the FileIO. The FileIO is responsible for only writing a posix compatible path.

Yes I agree this is a simpler plan than my original thought. Are you currently already working on this? If not I can take it.

I think (3) as a procedure I think is probably still needed for tables already in that situation, and once we have that we can probably enforce it as a requirement and users for tables with issue can have a clear path forward.

@RussellSpitzer
Copy link
Member Author

I have not started, so you are free to take it if you like. I do think we need confirmation from the Trino folks that this is something they can also follow. I know they have a completely different pathway for opening and creating files.

@RussellSpitzer
Copy link
Member Author

Oh then this is actually much needed for trino too, since by default they can't read a non-posix file either?

@jackye1995
Copy link
Contributor

Correct, that's why I said there is an internal patch in Athena to support non-posix files.

@electrum
Copy link
Contributor

electrum commented Feb 7, 2023

We have a hack in Trino to allow reading non-standard paths. That's the HadoopPaths code referenced above, along with this code in TrinoS3FileSystem.

All of our writing code currently goes through Hadoop, so paths in S3 will be normalized, but we have a project to decouple Trino from Hadoop and Hive codebases, so we'll want our upcoming non-Hadoop TrinoFileSystem implementations to handle this correctly.

Handling . and .. is tricky, though. If a user does this:

CREATE TABLE ... WITH (location = 's3://foo/bar/../baz')

What should the resulting object name in S3 be? What gets written to the Iceberg manifest? Where should the normalization happen? Do we need to normalize all "user input" locations?

While I like the idea of just using strings and not dealing with directories (which we copied from Iceberg FileIO), it certainly has annoyances and problems when interacting with POSIX-like systems.

@RussellSpitzer
Copy link
Member Author

RussellSpitzer commented Feb 7, 2023

My call would be that everything new file we create with FileIO (in iceberg) hits this transformation

import java.nio.file.Paths
scala> def posixNormalize(s: String) = Paths.get(s).normalize.toString
posixNormalize: (s: String)String
scala> posixNormalize("s3://foo/bar/../baz")
res41: String = s3:/foo/baz

So I would add that as an protected method in FileIO and then any calls to
OutputFile newOutputFile(String path);

call posixNormalize before doing anything else. Ideally I wish I could force this on all calls to newOutputFile but I don't think we can manage that without breaking the API. Maybe we can do that in Iceberg 2.0 and have

final OutputFile newOutputFile(String path) {
  return newOutputFileImpl(posixNormalize(path))
}

OutputFile newOutputFileImpl(String path)

For now I would just add that to S3FileIO

@jackye1995
Copy link
Contributor

Handling . and .. is tricky, though. If a user does this:

CREATE TABLE ... WITH (location = 's3://foo/bar/../baz')

What should the resulting object name in S3 be? What gets written to the Iceberg manifest? Where should the normalization happen? Do we need to normalize all "user input" locations?

Yes this case is interesting, I can see it go both ways. Sounds like we should still have a feature flag controlling this behavior then?

@RussellSpitzer
Copy link
Member Author

RussellSpitzer commented Feb 7, 2023

I still think we just set this up as Iceberg only makes posix paths for any new files it creates. Anything going through FileIO.create is normalized. This means everything in the metadata is normalized unless a user explicitly and manually adds a data file that isn't.

@electrum
Copy link
Contributor

electrum commented Feb 7, 2023

I just realized this has implications for the specification. If a manifest file contains a location containing . or .. path segments, how are readers supposed to interpret that? The current implementations that use Hadoop code will normalize them. Is this correct behavior? Should we formalize this behavior in the specification?

@RussellSpitzer
Copy link
Member Author

@electrum thats part of my suggestion, no path entries can contain any unnormailzed posix characters. No .., ., //, maybe also specify no symbolic links although I'm not sure how we enforce that.

@findepi
Copy link
Member

findepi commented Feb 8, 2023

I am concerned about security implications of handling .. in paths. At the same time I don't see any practical implications of that. If we gonna do any form of normalization, we should forbid such paths.
I think we should forbid . too.

@jackye1995
Copy link
Contributor

I put up a draft for discussion, there are different ways to achieve that, the idea in the PR is to normalizes all the places that we write a location, which includes table location, metadata file location and data location.

I am concerned about security implications of handling .. in paths

How do Hadoop file systems currently handle such use case? I believe it will also support .. and . by default and derive the posix path of it right?

@RussellSpitzer
Copy link
Member Author

Consensus at Community Sync was that for now we will just add a strip trailing slash to

To remove any chance of accidentally adding the double slash.

For all the other oddities Jack's PR will allow for manually setting a flag to forbid/only write posix compatible files. This would cover ".., ." and some other things but still not help us in the case of symbolic links which ends up being a weirder proposition based on the FileIO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants