-
Notifications
You must be signed in to change notification settings - Fork 335
Support arbitrary javascript bundling tools #1568
Conversation
literally does nothing except break things :blushing:
Not sure *exactly* what this does but i basically just copied the parts i thought were relevant from the site code. We have a src dir, which is what needs to be watched, and an output dir, which is what needs to be analyzed, and a build command, which is what needs to be run when we wanna run the build.
@caass this is awesome – thanks so much! None of us have had time for the detailed response that your comments merit, but your analysis is great. We'll try to get back to you tomorrow – but I didn't want to leave you hanging and thinking this was unappreciated 🎉 |
I think...at least...that that's what this commit does. Def starting to get into "oh god i hope i don't break anything" territory.
Also, use the nice AsRef<Path> idiom that works so well
https://github.com/cloudflare/wrangler/blob/ce41671a36db1f81275685362e9fd8f2f92b6f53/wranglerjs/index.js#L153 we use an intermediate representation. this isn't specific to webpack, but would currently require an indirection to support other bundler.
sounds correct.
as you mentioned, wranglerjs should be a step after bundling that lints/transforms the output of any bundlers.
this target causes many issues for workers since it doesn't map entirely to worker API and many nodejs libraries won't be supported. I believe you described a better solution, using babel we can transform the code to polyfill missing API.
only functionally, provided by wranglerjs, is fixing the webpack internal runtime to support loading wasm module in a worker and wrangler must generate the correct metadata to bind the wasm module to a worker. |
@xtuc thanks so much for your thoughts!
I was actually looking into this for a different reason -- one of the drawbacks of the method of running wrangler after the build process is that there is another round of disk reads/writes that makes the whole process slower. This line clears things up for me and introduces another problem that is worth looking at. Currently, the process looks like (and again, please correct me if I'm wrong)
The naive approach I've been looking at so far for this process would be something like
I hadn't thought about intercepting filesystem writes from the bundler -- I was looking into more like, "how can I intercept arbitrary writes to the filesystem and redirect them into memory," to which the answer seemed to be "you can't." The first solution I thought of to avoid the extra overhead would be to require that the bundler output to I wonder if there's some way to create a monkey-patched version of
I hadn't even thought of polyfilling missing APIs -- I was leaning towards using swc_ecma_parser to run lints from Rust to 1) leverage the performance benefits and 2) keep more of the logic in one place (Rust), instead of passing back and forth too much, but I think if the end goal is to automatically polyfill missing APIs then it's a no-brainer to leverage babel instead.
While I've tended to stay away from WASM since it's never really been necessary for my workers use-cases, it's definitely a must-have for this PR to land. With that in mind (plus the LOC you linked earlier), I think the last missing piece of knowledge for me to build out an MVP is an understanding of exactly how Thanks again for your help ❤️ |
It does already and i don't think slowness is a concern, as long as it's not considerably slower.
Stores on disk. see ./dist (webpack's output) and the
webpack can do in-memory only but all bundlers won't be able to do that. I wanted to do it earlier but i think we need some specification of the work that needs to be done, before changing the code. |
I think it makes sense to actually create a `Bundler` struct because like...eventually in my head it makes sense to have one `Bundler` instance that you could like pass around in `wrangler dev` or whatever. I think most of the codebase is more functional-y but i'm sure someone will tell me if this is not the preferred style to do this in. How many more commits can I make before I have to actually parse any javascript? Let's find out, shall we?
This restructuring is because I think it makes more sense to have each FileType run its own checks. This also cleanly bridges between the functions in `js.rs` and `wasm.rs` and the BundlerOutput struct in `mod.rs`.
Looks like we will have to wait until the next commit to parse any AST
I keep making tons of changes without committing but I think that it seems bigger than it actually is -- mostly what I've done here is define a few traits Lintable: a struct is able to lint itself with a given argument to check against Parseable: a struct is able to create Self from a given input Validate: a struct satisfies both Lintable and Parseable The JS boilerplate is mostly done -- since we actually want to lint expressions, and not statements, the bulk of the code at the moment is just about getting to the expressions in each statement. Essentially a glorifed switch case with a little recursion mixed in. The webassembly is all todo!(), i just wanted to get typechecking working. Basically, this is just yet another commit before we actually parse any AST. But I did write some documentation, which is cool.
I guess...now I just have to...implement Lintable for all these different types of expressions...yippee...
Hey @caass, this is super cool. It looks like the current build is failing- if you can revert to a stable commit on this branch, I'd be happy to check it out |
@nataliescottdavidson sure! i'm pretty sure the build is failing because the linter is catching "unused variable" stuff for unimplemented methods -- i had started prefixing my variables with underscores to pass lint checks but then i wound up forgetting what still needed to be done 😬 Hopefully tonight i'll have the javascript linter closer to completion, which has been the bulk of the effort on this PR, although if you'd like I can make a quick commit to pass CI before you poke around :) |
Looking through the code for the durable objects beta chat example, I think I'd like to restructure this to accommodate the new Workers syntax:
I've been busy recently trying to stabilize |
But we're going for the restructure. again. round four. this time, with macros. because i'm sick of defining Lintable for all of these structs
Everything compiles now, which is nice. We just need to actually implement Lintable for JavaScript and we should have version one ready to go.
Hello again everyone! I've done some more work and I feel like we're nearing completion -- or at least, nearing "ready for testing". I've been doing most of the development here, basically breaking out the code effected by this PR and just running that locally instead of needing to compile and run the entire wrangler tool. It's not linked via git in any way, just me copying and pasting code back and forth, so things might be slightly out of sync. It mostly just makes things easier for me to debug by having a If anyone on the cloudflare side is interested in poking around, here's the basic idea:
There's still work to be done, and some things I'm not sure of, and the whole thing is basically untested. That being said, progress is being made! I would love some feedback on this if anyone has some spare time ❤️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @caass, really excited where this is going! I left some high-level feedback and questions. If you're not already in the beta to test this, you can send an email with your account id to, ashcon at cloudflare, and I'll get you access.
match n { | ||
ModuleDecl::Import(import) => { | ||
if import.type_only { | ||
self.error(import, "Typescript not allowed!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a path, with the current approach, to supporting Typescript in the future? (That would be awesome)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! swc
has full Typescript support, and has actually recently landed in Deno if that's a confidence booster on the library's reliability.
/// have access to the durable objects beta | ||
/// but it seems as though the format is basically: | ||
/// * one `.mjs` file that serves as the entrypoint to the worker, | ||
/// * any number other arbitrary files that can be imported into the worker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, but there is a subtle nuance: what is the content-type of each file?
You might want to import an HTML file as an ArrayBuffer
, thus using application/octet-stream
in the form data upload; or text/plain
which would import it as a String
.
From what I gather, there are 3 potential options:
- Always upload non-JS and non-WASM files as
application/octet-stream
and worry about this nuance later - Have users define the content-type of their imports somewhere in the
wrangler.toml
- Adopt the tc39 proposal for import-assertions, which
swc
now supports
Moreover, if we decide to adopt the proposal, there are 2 additional considerations:
- Since V8 does not support it, each assertion would need to be "removed" prior to upload?
- What will each assertion type be called? "arraybuffer"? "text"?
Coincidentally, our very own @xtuc is championing this proposal, so I think he would have some great ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hmm interesting!
I was able to find the tracking issue for V8, and it seems like they're working on it at the moment. I don't know what the typical timeline is with V8 from "started working" to "shipped", but I'd assume it's...a couple months? Is that overly optimistic? I'd love to hear some more input and hopefully by the time this PR lands it'll become a non-issue.
- Always upload non-JS and non-WASM files as
application/octet-stream
and worry about this nuance later
I choose the option that involves the least work!
- Have users define the content-type of their imports somewhere in the
wrangler.toml
Speaking from personal experience, the less research and configuring I have to do, the better. Part of the appeal of wrangler (and Workers) is that it "just works", so I'm kind of opposed to this idea but not strongly. Just kind of wary of adding more configuration that for most folks will likely be unnecessary.
- Adopt the tc39 proposal for import-assertions, which
swc
now supports
I'm hesitant to start parsing things that don't map directly to V8 because then we might start to get into more opinionated parsing, like you said, which could lead to unexpected behavior somewhere down the line.
In my opinion the optimal approach for getting this done before V8 adopts import-assertions would be to traverse the AST and see what kinds of methods are called on each import, and whether those are ArrayBuffer
type things or String
type things, but that seems...overkill?
I think probably the easiest thing to do is to use application/octet-stream
and document that that's what we've selected, and that if you want to manipulate your import as a String
you should do some conversion first. There's a million StackOverflow questions about that exact thing so I figure that won't be too much of a hassle.
- Check total bundle size, not individual files - Replace hyphens with underscores in wrangler.toml - Rename MAX_FILE_SIZE to MAX_BUNDLE_SIZE - Duplicate comment about swc_common::SourceMap vs sourcemap::SourceMap - rename entry & entry_file to module & module_path
Hey @caass, thank you for your work on this PR, and our apologies for not responding to you sooner! We're really excited about this PR, and we're going to use it as the base for a feature branch that adds support for durable objects. However, we've decided to try and keep as much validation for worker bundles in the API as possible, as to not duplicate logic between wrangler and the API. The validator work here is very neat though, and if you find it useful I'd suggest looking into packaging it as a standalone linter binary, where it can run as part of your custom build and return a failed status to wrangler if validation fails. |
Ok! Glad to hear that the code from this PR will go to good use :) |
Inspired by (and will hopefully one day close) #1558
Motivation
After searching for why I wasn't able to target ESNext with typescript and webpack, I found out it was because I was using optional chaining. Even though (if I understand correctly) the workers runtime uses V8, which supports optional chaining, the included version of webpack doesn't. I found someone had already brought this up in #1558, and since I religiously read the cloudflare blog posts and have been messing around with rust for the past few months, I thought I might be able to contribute this feature, since Workers is my go-to platform.
The idea behind this PR is that wrangler should be able to transparently sit on top of any arbitrary javascript bundling system, such as rollup or parcel or even swc. While integration with webpack has been awesome and has enabled me to use all sorts of neat things like typescript and whatnot, ultimately it does feel like kind of an arbitrary choice -- and hardcoding in a bunch of webpack-4 specific stuff feels...unfriendly to maintenance?
What I mean to say is that the cracks are starting to show (e.g. with nullish coalescing and optional chaining), and I think now would be a great time to implement this functionality to allow wrangler to decouple itself from any specific javascript bundling tool as Workers continues to become a true competitor to offerings like Lambda and GCF.
I would love to hear feedback on whether or not this is something I should be working on (i.e. something similar is in the works internally), as well as any design decisions to take into consideration.
Problem
Currently, wrangler is coupled with webpack in only a couple ways:
If I'm understanding correctly, wranglerjs basically works by:
a. ...the target is webworker (here)
b. ...the output filename is "worker.js" (here)
a. ...remove the dependency on a filesystem, which workers doesn't support (here in wranglerjs vs here in webpack)
b. ...disable import mangling and streaming (here)
Because of the way that wranglerjs kind of "wraps around" webpack, it's impossible to use other tools (such as webpack 5).
Solution
Before I start getting into rampant speculation, I would once again like to interject -- y'all are the experts on WASM/JS/etc., and I am just a dev who wants to use optional chaining in her workers. Please, please tell me where I have missed/misunderstood stuff. This whole PR idea might be completely impossible.
My proposed solution to this problem would be to change the functionality of wranglerjs so that it runs after the javascript bundling process, and makes the necessary modifications (sanity checks, wasm imports) then.
Sanity checks
Checking for the "webworker" target is basically just trying to make sure we (the lowly devs) aren't using any APIs not available in the workers runtime. We could use a tool like Esprisma to check to make sure that the generated script doesn't use any APIs other than the ones available in the Workers runtime (e.g.
Window
).Output filenames is trivial, wrangler can just rename and warn.
A new check needs to be introduced, which ensures that there's just a single JS file (and an optional single WASM file) since we no longer have control over the build process, which may produce multiple output files depending on its chunk-splitting strategy. I think this should just result in an outright build failure, since the alternative is to bundle the output ourselves, which kind of defeats the whole point of this PR.
Webassembly functionality
I'm actually pretty lost here. I have never successfully used WebAssembly with Workers (which is a me problem, not a Workers problem) so I'm not sure exactly how it's handled. I suspect that the .wasm files themselves are "dumb", meaning they don't need to be modified, and that all of the importing/streaming/mangling stuff occurs on the JS side.
If (fingers crossed) this is the case, this too could be accomplished by modifying the bundled code and doing some AST trickery to change any usages of streaming or filesystem stuff to use whatever logic Workers requires instead.
That's all! I would love to hear any and all thoughts regarding my proposal. Thank you for your time ❤️