-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base64 UTF-8 for image-layout ref names? #174
Comments
I think naming should be expressed in some way - how to you expect people to talk about a given image? "HEY I'm using I'm not saying I'm against changing the set of allowed chars in |
On Thu, Jul 21, 2016 at 11:35:51AM -0700, Antonio Murdaca wrote:
I expect people to say “Hey, I'm using ‘Ζεύς’” or “I'm using
Support for Unicode refs in image-layout does not mean that all |
I'm all for this if solves this char issue and there aren't any downsides in doing this |
On Thu, Jul 21, 2016 at 11:45:48AM -0700, Antonio Murdaca wrote:
I'm pretty sure this solves the “I want to use {char}” issue.
This is harder to get a handle on ;). There is clearly some increased For services / tooling that does not want to support Unicode refs, I can't think of any other issues, but that doesn't mean they don't |
base64 for filenames isn't great. It would work if the name were in a file, but then mapping them is a function of indexing. hrm |
On Thu, Jul 21, 2016 at 12:34:35PM -0700, Vincent Batts wrote:
Can you unpack that a bit more for me? RFC 3548 specifically calls |
Packing naming into refs is a really bad idea. I'm going to comment more on #173, but these act more like tags, in the way that git works. Indeed, making these base64 is probably not such a good idea, either. The character set has been chosen for wide artifact and filesystem compatibility. This conversation really needs to be held in the context of naming. |
On Thu, Jul 21, 2016 at 04:20:32PM -0700, Stephen Day wrote:
Agreed. This is just “what do we gain by restricting the ref
“Base 64 Encoding with URL and Filename Safe Alphabet” 1 seems like For other artifact stores, I expect they'll handle this in their own
I think naming as in “Trevor signed image sha256:5b… as debian:7.0” is |
It is not human-readable and it makes debugging harder. Simple as that.
This is an apt analogy. |
On Thu, Jul 21, 2016 at 04:41:02PM -0700, Stephen Day wrote:
These are not compatibility concerns. I'm generally in favor of How frequently do you expect to want to debug an image-layout instance $ oci-image-tool refs list image.tar Or are you concerned that that image-spec tooling will be so difficult |
@wking I'll say it again, packing names into |
On Fri, Jul 22, 2016 at 01:17:53PM -0700, Stephen Day wrote:
This proposal is not about packing names into refs. It's about |
I believe the restriction on characters can stay the way it is now w/o over-complicating it - since this came up as part of allowing colons to express "name:ref" |
On Fri, Jul 22, 2016 at 01:32:37PM -0700, Antonio Murdaca wrote:
This would also help simplify portability with ref names that include If there is nobody actively pushing for a particular new character, |
that's still tied to solving the naming problem.
having name restriction is something we can just document as said in [3] from your comment. Btw, if you guys really feel this should be extended I'm all for this but this is a non-issue right now to me. |
Here is an example of a recent breakage: http://latkin.org/blog/2016/07/20/git-for-windows-accidentally-creates-ntfs-alternate-data-streams/. I am very familiar with the "I want another character problem". The solution to this is usually not to add more characters, but remove the contention over character usage by making provisions to store arbitrary image names outside of an area that has more strict requirements. For example, only put names in fully escaped contexts (not a url path). There is also the consideration that a simplified naming scheme is usually more secure. When you start introducing unicode, there are many code points that look the same but aren't, making copypasta a risky business. There is also the consideration that these names may want to be mapped to DNS, which creates further restrictions. |
On Fri, Jul 22, 2016 at 01:44:29PM -0700, Antonio Murdaca wrote:
For your use-case ;). But who knows what other people will want to If we want to wait until someone gives us a meaningful use case and I'd rather have the image-layout layer say “use whatever ref names you |
This isn't really a problem. All platforms have some concept of a file system separator that can be mapped 1:1.
Human readability and debugging are important here. If you remove the incentive to complicate the on-disk filename mapping, this problem goes away at the cost of obfuscation. Restricting the character set is one solution, while encoding is another. One must also take into account the security ramifications. You can read about the considerations made in mapping names to paths and urls in docker/distribution for more info. |
On Fri, Jul 22, 2016 at 01:47:51PM -0700, Stephen Day wrote:
Like a filename-safe-base-64-encoded filename ;).
Sure, and this is a concern for domain names too. But the IETF
I think this is #175. |
On Fri, Jul 22, 2016 at 02:03:45PM -0700, Stephen Day wrote:
I'm not saying you couldn't work around any missing character, I'm
Can you post a link? I can't turn up something with those keywords docker/distribution$ git describe |
Hopefully, this will get you started: https://github.com/docker/distribution/blob/master/registry/storage/paths.go This is mostly an intractable problem and a balance needs to be struck. |
On Fri, Jul 22, 2016 at 03:29:20PM -0700, Stephen Day wrote:
Thanks for hunting those down for me :).
This is a good overview of how docker/distribution organizes refs, but This leads to docs about path traversal attacks 1, which we avoid This references a proposal to base 64 (or 32) encode names in This looks like “please add support for an additional character”,
It's only intractable if you make human readability a hard requirement
|
@wking We have tests for path traversal. Most of this is thematic, but there is a long history of discussion in this area.
Humans use this system. I actually think the |
On Fri, Jul 22, 2016 at 05:22:50PM -0700, Stephen Day wrote:
Again 1, not encoding isn't impossible, it's just that always
But not without tools. Use the tools to see the decoded refs. They
I'll read up on ELF sections, but ELF is certainly not a format I can |
@wking The comparison with ELF means that OCI controls the names of refs. That doesn't give license to obscure everything unnecessarily. |
On Mon, Jul 25, 2016 at 02:15:29PM -0700, Stephen Day wrote:
I think you mean “we are specifying the ref name ↔ path mapping”, and
I'm not suggesting we use Base 64 to obscure ref names. I'm |
I still believing restricting the charset is useful, right now this won't have happened if there wasn't a restriction and that's good because we can advance step by step here. With your base64 refs people could have just done what I was proposing (multiple images in the image-layout) but that could have been wrong and by opening the charset to whatever also open the spec to something not wanted (or better, not specified) |
That said, I still think this might be useful. But it doesn't open the doors just to my use case. And it can probably make other decisions and design harder here. |
On Mon, Jul 25, 2016 at 02:41:35PM -0700, Antonio Murdaca wrote:
That's “wrong” / “not wanted” according to this judgement call 1. I |
Wrong/not wanted == probably mis-used |
On Mon, Jul 25, 2016 at 03:02:57PM -0700, Antonio Murdaca wrote:
This is still based on pushing a higher-level abstraction (verified |
I will address this when I close out #173 . The idea is to only have refs be things that are classicly "tags". So, v1.0.0 not tons of crazy characters. |
On Wed, Aug 24, 2016 at 07:18:22AM -0700, Brandon Philips wrote:
This will make unrelated images in the same ref store more difficult And I still think we'll make our future lives easier by making refs I still don't see why image-spec needs to have an opinion on ref |
well, given that we could adopt the git-style of |
Conclusion on this one based on the discussion on the call today. Essentially everyone feels we have selected the correct subset of characters to support most filesystems. And we have not attached any semantic meaning to these refs in this directory. So, if someone wants fancy UTF-8 refs they can create a higher-level tooling expectation that says, |
On Wed, Sep 14, 2016 at 02:56:30PM -0700, Brandon Philips wrote:
I would still rather have gone with this approach, allowing safe, |
Spun off from #173.
We currently have a charset restriction on ref names, but folks may want to use colons, etc. I don't think folks should care about human-readability of the
refs
filenames, so I'd like to change the mapping from refs to filenames from “they're the same thing” to “the filename is the NFC Unicode that is UTF-8 encoded and then filesystem-safe Base 64 encoded”. Then folks can put whatever they want in the ref name.As an example in Python 3:
So the
Ζεύς
ref would be stored inrefs/zpbOtc-Nz4I=
.There's previous Base 64 discussion (in the context of container IDs) in opencontainers/runc#675 and opencontainers/runc#676.
The text was updated successfully, but these errors were encountered: