Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: packaged applications (WARC archives?) #36812

Closed
arcanis opened this issue Jan 6, 2021 · 23 comments
Closed

Feature request: packaged applications (WARC archives?) #36812

arcanis opened this issue Jan 6, 2021 · 23 comments
Labels
feature request Issues that request new features to be added to Node.js. stale

Comments

@arcanis
Copy link
Contributor

arcanis commented Jan 6, 2021

Is your feature request related to a problem? Please describe.

We're distributing Yarn as a single-file script. While it's great for accessibility, it has an impact on the boot time since Node needs to parse the whole file before even starting to execute it. Additionally, the file is larger than it needs to be because various binary payloads have to be encoded as base64.

Describe the solution you'd like

We'd like to eventually distribute Yarn as a packaged application. Imagine an archive with the source code, and we would call this archive like any other. For Node, the archive would be treated as a directory: node ./yarn.warc/index.js. Given that WARC is on its path to standardization, it seems the most consensual choice.

Prior work

Yarn already provides in-zip filesystem access for the packages it installs. If Node is interested to use Zip instead of WARC we could provide our implementation, which closely follow Node's APIs.

Describe alternatives you've considered

Supporting the __halt_compiler directive would be another way to delegate this responsibility to userland. It would likely be much easier to implement, and would allow for greater flexibility, and tbh I'd very much prefer this approach. Unfortunately, it may require work on the parser level, and I think that would bring it to v8, possibly TC39 lands. Given that the context is almost exclusively relevant to Node, I'm worried it wouldn't go anywhere.

@aduh95 aduh95 added the feature request Issues that request new features to be added to Node.js. label Jan 6, 2021
@aduh95
Copy link
Contributor

aduh95 commented Jan 6, 2021

Supporting the __halt_compiler directive would be another way to delegate this responsibility to userland

I wonder if that's already available using WASM (because the wasm file contains binary data), but I haven't looked it up.

@arcanis
Copy link
Contributor Author

arcanis commented Jan 6, 2021

I wonder if that's already available using WASM (because the wasm file contains binary data), but I haven't looked it up.

That's an interesting idea - I think it could be possible, but only if there was a way for the wasm module to itself execute arbitrary JavaScript code (so that our zip-capable boot logic could then spawn the source it extracted from the embed archive). Since in this scenario we would only distribute the yarn.wasm file, we couldn't provide any binding other than the default ones provided by --experimental-wasm-modules.

@targos
Copy link
Member

targos commented Jan 6, 2021

For Node, the archive would be treated as a directory

Do you mean for module resolution only, or in general? What should happen if someone does fs.readFileSync('yarn.warc/index.js') or fs.writeFileSync('yarn.warc/something.txt', 'some text') ?

Edit: I'm asking this because currently the module system relies heavily on fs operations to do its work.

@arcanis
Copy link
Contributor Author

arcanis commented Jan 6, 2021

Do you mean for module resolution only, or in general? What should happen if someone does fs.readFileSync('yarn.warc/index.js') or fs.writeFileSync('yarn.warc/something.txt', 'some text') ?

With Yarn we only "mount" zip archives as read-only, so we throw EROFS errors on mutations. Read queries (typically readFile, stat, or readdir) all work as you would expect from a regular directory.

@devsnek
Copy link
Member

devsnek commented Jan 6, 2021

fs is filesystem operations, not WARC operations. at the least I'd expect a separate api for this.

@arcanis
Copy link
Contributor Author

arcanis commented Jan 6, 2021

I don't have any opinion on how it should be presented as a Node API. My "treat archives as a directory" comment only referred to a candidate command line usage:

node ./yarn.warc/index.js

Honestly this is also a fairly inconsequential part of the request, I understand there are other interfaces that could work too. I'd prefer if discussions were around the idea of packaged applications, whether they are a good thing or a bad thing, and what would be the high-level option (WARC, a classic archive, wasm, something else). How it's interfaced can be a followup discussion.

@aduh95 aduh95 changed the title Support WARC archives Feature request: packaged applications (WARC archives?) Jan 6, 2021
@devsnek
Copy link
Member

devsnek commented Jan 6, 2021

Once interface types are in V8, it should be possible to write a wasm module which directly imports runInThisContext from vm and invokes it. (It is already possible for the wasm to import that function, it is just unable to directly pass a string without interface types)

@devsnek
Copy link
Member

devsnek commented Jan 6, 2021

You could also just drop the data into a comment at the end of your file using some high density encoding which can't contain the sequence */.

@andreialecu
Copy link

andreialecu commented Jan 26, 2021

There are some merits for using the ZIP format for packaging applications, having been successfully used in:

  • Java: .jar are ZIP files
  • Android: .apk files are JAR files (thus ZIP files)
  • iOS: .ipa are ZIP files
  • macOS: .xip as signed ZIP files

Other advantages are:

  • The format can be manipulated in pure JS with help from the built-in zlib module in nodejs. It only consists of a bunch of headers + DEFLATE compression. An example of a pure JS implementation of the zip format: https://github.com/cthackers/adm-zip
  • The format is stable, established, well documented and efficient
  • Random file access is fast. Zip files keep a "Central Directory" which stores an index of all the files within, with their attributes and permissions (thus behaving like a file system)
  • Compression is optional. Zip can be used to simply "store" the files, for performance purposes. Or various levels of compression can be configured, when size is more important.

Treating archives as directories and allowing read-only access to them via the native fs module seems like a solid way to implement this.

There is some prior art in the node ecosystem here as well, @arcanis already mentioned Yarn v2, which successfully monkey patches zip support over nodejs' fs module, and thus allows running JavaScript from within zip files directly without further modifications.

Main downside to using WARC would be that the format is relatively verbose and is generally meant for archieving, as opposed to random access. It wouldn't perform as good as ZIP.

Additionally, editors like vim and emacs can edit files within zip files directly. VSCode can also work with zip files with the help of an extension.

@andreialecu
Copy link

Another note on this. The nodejs source repo already includes a full featured native zip implementation (via v8):

https://github.com/nodejs/node/blob/master/deps/v8/third_party/zlib/google/zip.cc

@mhdawson
Copy link
Member

mhdawson commented Feb 8, 2021

@arcanis is what do you have to be in mind in the packaged application? Just modules to be loaded or arbitrary files? If latter what determines where to look in the packaged application or local file system for a file?

@arcanis
Copy link
Contributor Author

arcanis commented Feb 8, 2021

what do you have to be in mind in the packaged application? Just modules to be loaded or arbitrary files? If latter what determines where to look in the packaged application or local file system for a file?

I'm not entirely sure I understood properly, so feel free to ask more details, but the idea would be to run a Node application that would be bundled into a single file. Modules would be included, of course, but other files might be part of it too (for instance, consider an app that would contain a csv data file, or a project generator template). It would have an entry point, similar to how packages have their index.js.

@mhdawson
Copy link
Member

Thanks. You answered the first question. What I was trying to understand in the second was the case where the application accesses non-modules files from both the package and outside the package. For example, if it needs to read a csv which is part of the package but then writes out a file to the local file system. When using the fs APIs and I reference a file called "foo" how will Node.js know whether to look in the package for it or the local file system for that file?

@arcanis
Copy link
Contributor Author

arcanis commented Feb 12, 2021

then writes out a file to the local file system

In the case of Yarn we treat zip packages as read-only, so packages can never write anything into themselves (at least not without tricking the system). Additionally, still in our case, we let them access all files through the regular Node APIs. So finding whether to look in the package or the local file system is as simple as finding out whether the paths passed to the fs methods are "contained" within a zip path or not.

@mhdawson
Copy link
Member

Ok, so just to make sure I understand, the application has to know the file is in the package application and the paths passed to fs reflect that.

@arcanis
Copy link
Contributor Author

arcanis commented Feb 19, 2021

Ok, so just to make sure I understand, the application has to know the file is in the package application and the paths passed to fs reflect that.

Perhaps; let me show you some code, that might be less ambiguous. Imagine the following is an index.js located within the archive app/vendors/pkg.zip. This archive also contains a data.csv file. This works with our current implementation:

const fs = require(`fs`);
const path = require(`path`);

const csvPath = path.join(__dirname, `data.csv`);
const csvData = fs.readFileSync(csvPath);

// However, mutations would throw with
// Error: EROFS Read-Only File System
fs.writeFile(csvPath, `...`);

@mhdawson
Copy link
Member

So would cvsPath be app/vendors/pkg.zip/data.csv ?

@arcanis
Copy link
Contributor Author

arcanis commented Feb 20, 2021

Yep. Again, that's just how we already do it. There are pros and cons, so I'm not firmly attached to this particular approach (although I think it proved being a reasonable one in practice).

@targos
Copy link
Member

targos commented Feb 20, 2021

Does it work only if the call comes from a file inside the archive? Or is the idea that the entire Node.js runtime would be able to transparently read inside zip files?

@arcanis
Copy link
Contributor Author

arcanis commented Feb 20, 2021

Does it work only if the call comes from a file inside the archive? Or is the idea that the entire Node.js runtime would be able to transparently read inside zip files?

The entire one, since paths may be passed from one package to the other transparently.

@andreialecu
Copy link

@targos I made a minimal proof of concept that uses Yarn2's .zip fs monkey patches:

https://github.com/andreialecu/node-packaged-app-poc

You can clone it and run node -r ./.pnp.js -e "require('./app.zip/index.js')".

app.zip has two files:

index.js

const fs = require('fs');
const path = require('path');

console.log(fs.readFileSync(path.join(__dirname, "hello.txt")).toString())

hello.txt

Hello Packaged App!

Running node -r ./.pnp.js -e "require('./app.zip/index.js')" yields:

➜ node -r ./.pnp.js -e "require('./app.zip/index.js')"
Hello Packaged App!

.pnp.js contains a WASM .zip implementation and the fs patch (and is generated by yarn2).

@github-actions
Copy link
Contributor

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@targos targos moved this to Pending Triage in Node.js feature requests Mar 22, 2022
@targos targos moved this from Pending Triage to Stale in Node.js feature requests Mar 22, 2022
@github-actions
Copy link
Contributor

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. stale
Projects
None yet
Development

No branches or pull requests

6 participants