Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize output paths #2

Merged
merged 2 commits into from
May 9, 2018
Merged

Conversation

djdv
Copy link
Contributor

@djdv djdv commented Mar 21, 2018

This allows paths inside of an archive to be extracted even if their target-path would otherwise be illegal.

TODO:
^ is allowed on Windows/NTFS but not Windows/FAT32, accounting for this will require detecting the target FS and switching censor-sets. This isn't true, I can create files with ^ in the filename on FAT, FAT32, and NTFS on Windows.

I'm waiting for something upstream in order to handle NUL directory output, it's either going to be fixed upstream or handled specifically here.

@djdv
Copy link
Contributor Author

djdv commented Apr 6, 2018

This is still a work in progress but I'd like some feedback on my approach here, specifically the api additions, not the rest of the implementation yet.

An issue related to symlinks has come up for gx and go-ipfs
https://discuss.ipfs.io/t/issus-build-from-source/2445/8
ipfs/kubo#4906
Looking at the history of various projects, I'm assuming that the prebuilt versions of gx may have been using an old version of tar-utils before symlink support was added to it.
Previously, tar-utils would create empty files instead of symlinks.
Now, we try to create them properly, and fail on Windows due to access restrictions.

Coming up with a good solution to this problem has proved difficult. What I've opted for is adding a feature where we can chose to mimic the original behaviour conditionally, by calling Extractor.StubLinks(true).

While this is not ideal, it gives Windows users the option to extract everything except symlinks and still succeed, without requiring special privledges.
If the user has the right privileges, they can simply not enable this feature, and create links as normal.
Given the restrictions around symlinks on Windows, I think this is an acceptable compromise, but would like to hear opinions and/or alternatives.

Below is an example of what I expect the get fixes to look like inside of go-ipfs if this change makes it in.

...
if windows {
    extractor.SanitizePaths(true)
    if !linkCreationRights {
        log.Warn("Symlink creation privileges not held, see: http://placeholder.com/how-to-enable-symlink-creation-on-various-versions-of-Windows.html")
        extractor.StubLinks(true)
    }
}
extractor.Extract(reader)
...

I'm also considered adding a .Modified or .Sanitized bool to the extractor struct, this would allow people to see if the output was modified by the sanitize functions or not. The only usecase I can think of for this is in go-ipfs where we could print a warning to the user after ipfs get if the output is different from the source.
i.e. this would be printed after extraction

warning: get was successful but paths had to be modified for your OS (hashes won't match if added again)

Looking for feedback on that, good idea or bad?

@djdv
Copy link
Contributor Author

djdv commented Apr 6, 2018

Other approaches to consider:

  • Instead of creating stubs for symlinks, we could just skip them entirely, which may be a better option.
  • We could create a copy of the link target instead of creating a link. Problems with this are that the target must exist first, which is not guaranteed, nor is it guaranteed to be safe/sane.

@djdv djdv force-pushed the feat/fs-sanitize branch from ff28315 to 47f4ec1 Compare April 13, 2018 17:16
@djdv
Copy link
Contributor Author

djdv commented Apr 18, 2018

I've changed it so that users can assign their own SanitizePathFunc, SanitizePathCallback, and LinkFunc, on the Extractor. If Extractor.Sanitize(true) is called, we provide a default implementation.

All of these are relative to the extraction target, currently a user may not sanitize to a path ouside of the extraction root. This could be changed if desired.

I'm starting to incorporate this into go-ipfs which should show some usage of this.
(only changes to core/commands/get.go are relevant, everything else is in progress Windows handling and can be disregarded)
ipfs/kubo@1399169
https://github.com/djdv/go-ipfs/tree/feat/fs-sanitize

ipfs/kubo#4956

Improvements here may be possible but so far this is the best approach I've come up with to deal with this.

@whyrusleeping
Copy link
Owner

cc @Stebalien @Kubuxu @magik6k

@djdv
Copy link
Contributor Author

djdv commented Apr 24, 2018

All of these are relative to the extraction target, currently a user may not sanitize to a path ouside of the extraction root. This could be changed if desired.

I want to clarify that when I say, "this could be changed" I mean the patch could be changed to allow this. As it is right now, we essentially have a hard limit where you may only extract files to the extraction-root, nowhere else. We could allow people to rewrite the entire path (extraction-root included), if they want to.

@djdv
Copy link
Contributor Author

djdv commented Apr 24, 2018

@warpfork
Copy link

Just a random snippet of anecdata, since I see some mentions of NTFS and interesting path names in the topmost description:

What's supported by NTFS the-format-on-disk and what's actually allowed by Windows at various API levels can be divergent. E.g., you can't put colons in filenames in Windows Explorer. But you certainly can when mounting an NTFS drive in linux! And later on the Windows NTFS drivers will continue to deal with this surprisingly calmly. So that can add a whole layer of complexity to the logic or feature detection of what can and cannot fly on a filesystem.

@Kubuxu
Copy link
Contributor

Kubuxu commented Apr 27, 2018

Does it handle forbidden file name with an extension? Names like aux.go or com1.doc are also forbidden..

@djdv
Copy link
Contributor Author

djdv commented Apr 27, 2018

Still debating on if we should allow sanitize functions to modify the entire input path or just the extraction base. This would mean a refactor where

SanitizePathFunc func(pathComponents []string) (joinedComponents string)

is changed to

SanitizePathFunc func(path string) (sanitizedPath string, userDefined error)

This would eliminate the need for a callback and allow for chaining functions. Users who wish to use the built in sanitizer, monitor changes, and/or use their own sanitizer on top, could do something like this:

// edited 2018-05-08
extractor.Sanitize(true)
builtinSanitizer := extractor.SanitizePathFunc
extractor.SanitizePathFunc = func(path string) string {
    builtinSanitizedPath, _ := builtinSanitizer(path)
    if path != builtinSanitizedPath {
        fmt.Printf("Builtin sanitizer changed path %s -> %s\n", path, mySanitizedPath)
    }
    mySanitizedPath, err := someLocalFunction(builtinSanitizedPath)
    if err != nil {
        return err // user defined error from local function that aborts extraction
    }
    if builtinSanitizedPath != mySanitizedPath {
        fmt.Printf("Local sanitizer changed path %s -> %s\n", builtinSanitizedPath, mySanitizedPath)
    }
}
extractor.Extract()

Which seems better. However library users will have access to the entire path. This means they could redirect output to any path, not just children of the extraction root. Whether this is a benefit or detriment is what needs to be decided upon. I don't think there's any harm in allowing full path rewrites here, but I'm unsure about it.

@warpfork
This is interesting

And later on the Windows NTFS drivers will continue to deal with this surprisingly calmly.

The current goal for the Windows implementation is to stay within the restrictions laid out here: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

Even if we can circumvent these limitations for creation+IO, it may not be desirable, as other applications may have unexpected behaviour when supplied with these file paths.
I may want to experiment with this though. In the event Win32 applications have no problems, we may want to allow it.

For non-Windows however, the interface here should allow people to implement/impose a less strict subset for NTFS on their platform if desired.
The complexity of that seems large though, if we consider the different restrictions of each file system+platform. It would require checking which filesystem the target will reside on.
So the pipeline for say NTFS on Linux may look like:
take input path -> apply platforms filter(s) -> check destination FS -> apply fs filter(s) -> return path -> extract to dest successfully

Related anecdote, MacOS seems to also have restrictions around : but the translation of it seems to be handled by the OS itself.

@Kubuxu
It currently does. I've been using these as test cases https://regexr.com/3n3ut

>ipfs ls QmU1VJwixLhxd45YXVB4bm5Nf5Rjg7DpnwBvdWF15TLKGG
zCT5htkeCRq9SGwFYHpV3Hx3hocqt9dAb3pQjr8ESWL8aCqqiWgA 109669 aux.go
zCT5htkeCRq9SGwFYHpV3Hx3hocqt9dAb3pQjr8ESWL8aCqqiWgA 109669 com1.doc

>ipfs get QmU1VJwixLhxd45YXVB4bm5Nf5Rjg7DpnwBvdWF15TLKGG
Saving file(s) to QmU1VJwixLhxd45YXVB4bm5Nf5Rjg7DpnwBvdWF15TLKGG
 0 B / 214.31 KB [--------------------------------------------------------------------------------------------]   0.00%
"aux.go" sanitized to "_aux.go"
"com1.doc" sanitized to "_com1.doc"
WARNING: Extraction output had to be modified in order to be stored successfully
 214.31 KB / 214.31 KB [===================================================================================] 100.00% 0s

>dir /b QmU1VJwixLhxd45YXVB4bm5Nf5Rjg7DpnwBvdWF15TLKGG
_aux.go
_com1.doc

@djdv djdv force-pushed the feat/fs-sanitize branch 2 times, most recently from c34d2ce to f27e878 Compare May 8, 2018 16:18
extractor.go Outdated
elems = elems[1:] // remove original root
// Sanitize toggles output sanitation, using built in sanitation functions
// (Modify paths to be platform legal, symlinks may not escape extraction root)
func (te *Extractor) Sanitize(toggle bool) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this input arg seems to be ignored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it should actually check it and unset the function pointers on false.

Copy link
Owner

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question, then LGTM. Why is windows so complicated?

@djdv djdv force-pushed the feat/fs-sanitize branch 2 times, most recently from 53d07d8 to ddacde9 Compare May 9, 2018 12:56
@djdv
Copy link
Contributor Author

djdv commented May 9, 2018

@whyrusleeping
Fixed Sanitize() and removed the darwin build constraint that was accidentally left in.
We should be good to go now. 👍

One small question, then LGTM. Why is windows so complicated?

We're going to need another 30 days for that one. 8-)

Edit:
also fixed the documentation for the Link struct

@djdv djdv force-pushed the feat/fs-sanitize branch from ddacde9 to 17c747b Compare May 9, 2018 13:09
@whyrusleeping
Copy link
Owner

@djdv thanks :)

I made you a collaborator, so if you go accept that invite, you should be able to merge this yourself (trying to put merge power in other non-me peoples hands as much as possible)

@djdv djdv changed the title [WIP] Sanitize output paths Sanitize output paths May 9, 2018
@djdv djdv merged commit 8c6c8ba into whyrusleeping:master May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants