Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: allow file: URL strings as paths for fs.promises #20950

Closed
wants to merge 1 commit into from

Conversation

guybedford
Copy link
Contributor

@guybedford guybedford commented May 24, 2018

As a follow-on to the security issue raised in #20944, this allows support for file:// URL strings in the fs.promises API only.

In #17658 it was noted that files or directories can exist with the name file: leading to worries over ambiguity here. For this reason, this implementation only supports the exact prefix "file://" in File URLs so that a rare file or directory called 'file:/', can still be reliable and uniquely distinguished from file URLs.

The URL standard actually suggests using strings over the object where possible in https://url.spec.whatwg.org/#url-apis-elsewhere, so it would be good to be able to enable such a best-practice.

This will also enable use cases like fs.readFile(import.meta.url) for modules, where the URL is provided as a string.

Performance is largely unaffected as there is a fast opt-out of this code path through the string prefix check.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added dont-land-on-v4.x fs Issues and PRs related to the fs subsystem / file system. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels May 24, 2018
@guybedford guybedford requested a review from jasnell May 24, 2018 21:22
return path;
path = new URL(path);
} else if (path == null || !path[searchParams] ||
!path[searchParams][searchParams]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs re-aligning

!path[searchParams][searchParams]) {
return path;
}
if (path.protocol !== 'file:')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can avoid this check for the first case where we already know it starts with 'file://'?

@@ -1392,6 +1392,20 @@ function getPathFromURLPosix(url) {
return decodeURIComponent(pathname);
}

function getPathFromURLString(path) {
if (typeof path === 'string') {
if (!path.startsWith('file://'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible micro-optimization here would be to make this path.length < 7 || !path.startsWith('file://').

Also, is it intentional to be enforcing a lowercase protocol like that?

@@ -3381,6 +3381,9 @@ The `fs.promises` API provides an alternative set of asynchronous file system
methods that return `Promise` objects rather than using callbacks. The
API is accessible via `require('fs').promises`.

In addition to returning a promise, `file:///` URL strings are also permitted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One extra / on this line.

@jdalton
Copy link
Member

jdalton commented May 25, 2018

Is the idea that this will eventually make its way into the non-promise based API too?

@guybedford
Copy link
Contributor Author

@jdalton that could be nice but I'm not sure what the process is here in terms of managing the small security risk. From my perspective that could well make sense for a major release that is able to relax such a constraint.

@jdalton
Copy link
Member

jdalton commented May 25, 2018

From my perspective that could well make sense for a major release that is able to relax such a constraint.

Yap. Make the potential breaking change in the non-promises variants the next major.

@jasnell
Copy link
Member

jasnell commented May 25, 2018

Generally +1 on this but the docs should point out the fact that proper validation of file URL paths can occur only after the URL is parsed... which means userland code that wants to perform it's own validation will still need to create the URL object, perform validation on the path, then pass the parsed URL into the methods. It also needs to be recognized that there is a definitely security risk in passing unverified/unvalidated file URLs to fs methods given that those are always absolutely paths.

@jdalton
Copy link
Member

jdalton commented May 25, 2018

unverified/unvalidated file URLs to fs methods given that those are always absolutely paths.

Should we add a doc warning for absolute paths in general or fs use in general?
I mean it could delete the drive (I've done it once).

@jasnell
Copy link
Member

jasnell commented May 25, 2018

Heh, tbh it's not a bad idea

@jasnell
Copy link
Member

jasnell commented May 25, 2018

hmmm... just a quick data point... I posted about the possibility of us doing this on Twitter and the immediate reaction from a few folks was literally "Please don't."

@iarna
Copy link
Member

iarna commented May 25, 2018

I'm on the Please Don't front, but I'm happy to expand on it:

Changing the validation requirements of arguments between the callback and promise based APIs feels very risky to me. It encourages folks to introduce substantial risk when porting their code from one to the other. As a user, I would expect the promise version to just be what util.promisify would make of it, with the modest changes it provides.

Because of the expectation that promise and callback APIs are different interfaces to the same thing, I am, as a rule, -1 on any change to the arguments passed to a promise-based API that are not also made to a callback-based API.

To back up a little, do I understand correctly that there are two justifications for this change?

  1. Add DWIM sugar to fs.readFile so you can do things like fs.readFile(import.meta.url) instead of fs.readFile(new URL(import.meta.url))
  2. Follow a recommendation in the URL standard.

I don't find either of these very convincing. The sugar added to #1 does not, to me, carry its weight, as the variant without it is not particularly cumbersome or awkward. As for #2, I would argue that what this really points to is that fs functions probably shouldn't take URL objects at all. (I am curious though! Does anyone have a pointer at what the justification for those was?)

@isaacs
Copy link
Contributor

isaacs commented May 25, 2018

Please don't do this. (In fact, can we roll back passing URL objects to fs methods?)

I maintain a lot of fs utils that get quite a bit of usage. (Among them: rimraf, glob, tar, which, graceful-fs, isexe, and quite a few others.) That's where I've done most of my OSS work.

One of the largest sources of bugs working with file systems is the fact that Windows and Posix have very different ways of expressing the same things.

I understand that it's nice to be able to do things like fs.readFile(new URL(import.meta.url)). Is it that much worse to do fs.readFile(new URL(import.meta.url).pathname)?

Let's let the long awful history of new Buffer(x) be a lesson to us that accepting varying argument types is almost always a bad idea. At least, it's a code smell that should be viewed with extreme scrutiny and handled with the utmost of care.

And I'd also argue that we should let the even longer and even more awful history of Windows file system paths be a lesson to us that having multiple different ways to encode and express the same concept is almost always a terrible idea. If we do this, then all of the logic that I've got kicking around that detects whether a path is absolute has to change.

These discrepancies around file system handling are not mere inconveniences (though they are really annoying inconveniences). They also create security problems by increasing the attack surface for vulnerabilities. When string handling logic is more complicated than necessary, bugs creep in, and those bugs allow attackers to pwn machines. (Seriously, look through the NSP advisories and observe how many of them boil down to "parsed a string wrong".)

We don't need file:// urls in the fs. It doesn't make Node.js more aligned with any browser standard (since there is no browser standard for file system access).

Do not multiply entities unnecessarily.

@jasnell
Copy link
Member

jasnell commented May 26, 2018

Not seeing any reason to revert the support for the URL object as that has been there for a while now without any issues.

But, I think the push back here is enough to say that we should hold back on enabling the URL strings.

@guybedford
Copy link
Contributor Author

The security concerns do sound like they outweigh the usability here unfortunately. Strings are hard!

As these universal modules workflows are explored it is important to be able to clearly understand what constraints we have to provide the best user experience for modules so a negative result on this one is still useful information.

I also had a look at a common isAbsolute userland implemantation at https://github.com/sindresorhus/path-is-absolute/blob/master/index.js and this clearly suffers from the security issue too - far from being a rare case this case is common, so would still spill into expectations on the fs promises APIs I guess.

That said, the universal usability problems still apply -

fs.readFile(new URL(import.meta.url).pathname) is not a good alternative as percent-encodings will not be respected (therefore no support for non latin characters for example).

So we really don't want users doing fs.readFile(new URL(import.meta.url).pathname) but actually fs.readFile(percentDecodeAllExceptPercentSlashInjections(new URL(import.meta.url).pathname))... which is what is offered by the URL object form of the fs APIs so nicely.

This all seems to indicate that import.meta.filename and import.meta.dirname will be needed and useful for NodeJS module workflows. Which seems a useful outcome of these explorations.

Thanks all for feedback.

@guybedford guybedford closed this May 26, 2018
@guybedford
Copy link
Contributor Author

guybedford commented May 26, 2018

Also worth noting the full expanded conversion is actually closer to:

fs.readFile(decode(new URL(import.meta.url).pathname.substr(isWindows ? 1 : 0)))

@isaacs
Copy link
Contributor

isaacs commented May 27, 2018

Not seeing any reason to revert the support for the URL object as that has been there for a while now without any issues.

I suspect that could be because it's not getting much use? I haven't seen any issues or PRs looking for support for this feature in the userland modules I maintain. It's marked as experimental, and hasn't gotten a ton of advocacy or discussion, from what I've seen.

What about adding a URL.filepath option that does the decoding? It's a "nonstandard" addition, but it would allow people to pull out the fs path portion of a file:// URL object in a safe way, without requiring support for varying argument types.

This all seems to indicate that import.meta.filename and import.meta.dirname will be needed and useful for NodeJS module workflows. Which seems a useful outcome of these explorations.

I'm not really seeing that? I'm making an assumption that that's the impetus for this, simply because I don't know where else file:// URLs are going to show up in day to day node usage. Most users never touch the module system internals, they just use require() or import to pull in code.

@isaacs
Copy link
Contributor

isaacs commented May 27, 2018

Aside from my fantasies: yes, it would be so nice if either we'd just insisted from the get-go that Windows users get comfortable with UNC paths and never added support for c:\blah style paths, or (perhaps better yet) had something like fs.open(new Path(string)) and never passed bare strings to these functions.

But the time to have the first discussion was 2010, and the second was 2009, so we're a bit late. I don't see how we get there without rewriting a lot of userland code, and/or going through a very long and brutal period of usability and security issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants