Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: File URLs in Node.js #22502

Closed
guybedford opened this issue Aug 24, 2018 · 4 comments
Closed

Discussion: File URLs in Node.js #22502

guybedford opened this issue Aug 24, 2018 · 4 comments
Labels
feature request Issues that request new features to be added to Node.js. fs Issues and PRs related to the fs subsystem / file system. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@guybedford
Copy link
Contributor

guybedford commented Aug 24, 2018

I wanted to open up a discussion on this topic, as previous work around this has perhaps lacked wider context (#20950)

Basically when we provide file:/// URLs to users (through import.meta.url, the Loader API for modules which relies heavily on file URLs, or any other means), we are expecting users to understand a lot of intricacies of how file URLs work in Node, which it already seems 90% of people will miss.

For example, see - wasm-tool/node-loader@e4f6b7d.

There are two major problems most people will walk into blindly when converting from file URLs to paths in Node.js:

  1. fs.readFile(url.pathname) works in unix systems, but will break on Windows. This means Windows support will naturally hit a wide and reliably propagating point of friction as these workflows integrate into the npm ecosystem. This issue will keep coming up across many projects as they work with file URLs.

  2. Non-latin characters need to be percent decoded. import './你好.mjs' will be resolved into file:///.../%E4%BD%A0%E5%A5%BD.mjs, so that in order to support loading the native characters from the file system, a percent decode operation needs to be performed on the path, with some special cases (eg not percent decoding path separators).

(1) is the immediate issue that will show as one of the standard Windows compatibility issues (alongside path.replace(/\\/g, '/')), and (2) seems like a deeper less seen Anglocentric preference that will continue to propagate here as well.

Since I've personally not been able to make any progress on this problem through #20950 I'd be interested to hear what we might be able to do about this.

What I would like to suggest here is two new native functions:

fileUrlToPath(string | URL) -> Node.path path
pathToFileUrl(path) -> URL

Let me know if that sounds like a good idea here, and I can see if we can get something into path or url... (suggestions on which is best are welcome too).

guybedford referenced this issue in wasm-tool/node-loader Aug 24, 2018
comply to new validation requirements of node 10 for custom loaders
@jasnell jasnell added fs Issues and PRs related to the fs subsystem / file system. feature request Issues that request new features to be added to Node.js. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Aug 24, 2018
@jasnell
Copy link
Member

jasnell commented Aug 24, 2018

I think utility methods like these would make sense to add along with some documentation around how and why they're useful.

@devsnek
Copy link
Member

devsnek commented Aug 24, 2018

those two functions already exist in internal/url. we would just need to expose them

@bmeck
Copy link
Member

bmeck commented Aug 24, 2018

I think some examples would get more clear explanation on why this is a great idea:

When converting path to URL there are multiple ways to mess up:

new URL(__filename) // errors (needs scheme)

__filename = './foo#1';
// '/foo' instead of the correct '/foo%231'
new URL(__filename, 'file:///').pathname;

__filename = './foo?2';
// '/foo' instead of the correct '/foo%3F1'
new URL(__filename, 'file:///').pathname;

__filename =  '//nas/foo.txt';
// '/nas/foo.txt' instead of the correct '/foo.txt'
new URL(`file://${__filename}`).pathname;

When converting from URL to path similar errors can occur (not just limited to other languages):

url = new URL('file://nas/foo.txt');
// foo.txt, but that is missing the remote host 😱
fs.readFile(url.pathname, () => {});

url = new URL('file:///你好.txt');
// reads '/%E4%BD%A0%E5%A5%BD.txt' instead of '/你好.txt'
fs.readFile(url.pathname, () => {});

url = new URL('file:///hello world.txt');
// reads '/hello%20world.txt' instead of '/hello world.txt'
fs.readFile(url.pathname, () => {});

Another concern that hasn't been mentioned here is that lots of windows tooling uses \ as the delimiter for CLI arguments. Having this properly convert to the native form of \ instead of people trying to manually manipulate / would be great and solve the following buggy code:

url = new URL('file:///c:/foo/data.csv');
// passes 'c:/foo/data.csv' instead of 'c:\\foo\\data.csv'
spawn('script.bat', [url.pathname], () => {});

@guybedford
Copy link
Contributor Author

This was resolved in #22506.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. fs Issues and PRs related to the fs subsystem / file system. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

No branches or pull requests

4 participants