Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First-class support for bin crates #1630

Closed
Pauan opened this issue Jun 27, 2019 · 15 comments · Fixed by #1843
Closed

First-class support for bin crates #1630

Pauan opened this issue Jun 27, 2019 · 15 comments · Fixed by #1843

Comments

@Pauan
Copy link
Contributor

Pauan commented Jun 27, 2019

Motivation

Right now, all crates must have crate-type = ["cdylib"]. This causes some issues:

  • Even though you're creating an application, you need to use src/lib.rs (which goes against Rust idioms).

  • You need to use this...

    #[wasm_bindgen(start)]
    pub fn main_js() {}

    ...rather than this:

    fn main() {}
  • You can't (easily) have multiple binaries, since you can only have one lib per crate.

    So that means you need to create multiple crates, link them together with a workspace, then set up a build script to call wasm-pack multiple times (since it can only compile one crate at a time).

    This is a pretty big deal: you might want to develop both the client and server in the same crate, but right now that's quite clunky.

    Let's compare that with the standard Rust approach, which is to simply make multiple src/bin/ subfolders.

Proposed Solution

Both wasm-bindgen and wasm-pack should have native support for binaries. You should be able to create a src/main.rs file, slap a fn main() {} into it, and have it Just Work(tm).

Similarly, you should be able to create multiple src/bin/ subfolders, and they should also Just Work(tm).

In other words, the user experience should be exactly the same as creating a non-wasm Rust binary.

That means wasm-bindgen and wasm-pack will need to compile multiple binaries at the same time and output them all together into the output folder.

Additional Context

There might be a bit of size bloat caused by using bin crates rather than lib crates. We should investigate what we can do to reduce that.

@alexcrichton
Copy link
Contributor

I personally see one of the bigger drawbacks here that fn main() {}, the ergonomic and more favorable way to do this, generates bigger binaries. Until we get that under control I don't think we can recommand fn main() {} because it gives off a falsely-bad impression.

The bloat comes from the standard library, specifically this function. This basically the initialization routine of the standard library, and while most of this is stubbed out for wasm (and as a result optimized away) there's two main sources of bloat remaining:

  • Initialization of thread information. The first thing this does is to_string() which brings in a whole allocator, and then there's a mess of synchronization involved as well. The practical effect of this on wasm is that panics are named thread <main> instead of thread <unknown>. We could presumably update something to be a bit more agnostic and not require allocations on the main thread to reduce the size of wasm binaries.

  • De-initialization of the standard library. There's two reasons that this brings in bloat. The first is that a Once is used to synchronize calls. This synchronizes on other platforms with calls to std::process::exit. The second reason is that it calls at_exit_imp::process() where we don't really have enough information statically to prove that it's a noop on wasm.

Fixing these two are the major sources of bloat, and there's still small pieces here and there but we're talking on the order of kilobytes for above and the remaining would be on the order of bytes.

Now that's sort of the main drawback I see for this today, but even if we were to solve that I'm not actually sure how we would detect this in wasm-bindgen. The perhaps easiest solution is to simply see that main exists as an export and automatically hook it up to start or otherwise arrange for it to be executed. That feels somewhat bad though since it may not always be intended...

@Pauan
Copy link
Contributor Author

Pauan commented Jul 15, 2019

That feels somewhat bad though since it may not always be intended...

Could you elaborate more on that?

If somebody has a bin crate with a main.rs file, and they have an fn main() { ... }, and they compile it to wasm32-unknown-unknown, and then they process it with wasm-bindgen... that seems like pretty clear intent to me.

@alexcrichton
Copy link
Contributor

Oh sorry sure, so for src/main.rs we unconditionally want fn main to "just run", 100% of the time. My thinking though is we can't distinguish:

// src/main.rs
fn main() {}

vs

// src/lib.rs
#[wasm_bindgen] 
pub fn main() {
    // ...
}

in the latter case we don't automatically run it today, but we should in the former (even though we don't today)

@Pauan
Copy link
Contributor Author

Pauan commented Jul 15, 2019

Ah, I see, yes that's tricky. Ideally rustc itself would insert a bit of extra metadata (perhaps in a wasm custom section) so that we can distinguish between bin and lib crates.

@alexcrichton
Copy link
Contributor

True yeah! If rustc did something like that then we could definitely handle this pretty easily. To be clear I'm not sure my example is a blocker at all, just something to point out. The main blocker to me is the binary bloat problem.

@boringcactus
Copy link
Contributor

does the bloat really need to be a blocker on this issue? it seems to me that "bin crates Just Work(tm) but the output binary is larger than it needs to be" is still better than the status quo, and that binary bloat could be fixed after the fact.

@Pauan
Copy link
Contributor Author

Pauan commented Oct 27, 2019

@boringcactus I tend to agree with that perspective, but... we've had a lot of people complain about the size of the .wasm files (and rightfully so). The file size really does turn a lot of people away.

So if the default experience creates huge file sizes, that's really not good. Most people will try out the default way of doing things, see the huge file sizes, then give up completely.

If it was just a small increase it wouldn't matter so much, but it's actually a quite large increase. Large enough to make even me worried (and I care less about file size than other people).

@boringcactus
Copy link
Contributor

@Pauan if it's so bad it scares people off, i guess that is not actually better than the status quo.

the thread naming issue could probably be solved by just not setting the thread name on wasm, although i'm not sure if that's really the best approach, or how best to implement that. no clue where to even start on deinitialization, though.

@alexcrichton
Copy link
Contributor

FWIW I think a PR to support this would be fine today (to send to wasm-bindgen), it's orthogonal that we could look to improve libstd's support for wasm

@boringcactus
Copy link
Contributor

unfortunately, inferring the correct export behavior for fn main() for bin crates is conceptually difficult - not only can i not figure out how to do it, i can't even tell where to do it.

however, if we make users toss

#[cfg(target_arch = "wasm32")]
#[wasm_bindgen(start)]
pub fn wasm_main() {
    main();
}

alongside fn main(), then changing wasm-pack to be accept bin crates is fairly straightforward (although i suspect crawling for extra binaries will be non-trivial; i haven't implemented that yet).

things i tried that don't work, and cause entry symbol 'main' defined multiple times:

  • wrapping main with #[cfg_attr(target_arch = "wasm32", wasm_bindgen(start))]
  • wrapping main with that and also #[cfg_attr(target_arch = "wasm32", export_name = "run_main")]

"you need to toss this arbitrary chunk of code at the end of your file" is not great. a short-term solution in lieu of something more optimal would be a new proc macro that generates that automatically, like so:

#[wasm_bindgen_main]
fn main() {
    ...
}

does anybody know if "automatically attach things to the entry point of a compiled binary" is even possible without making changes to rustc itself? is "everything will Just Work(tm) but first decorate your main with this proc macro" an acceptable compromise?

@Pauan
Copy link
Contributor Author

Pauan commented Oct 29, 2019

@boringcactus wasm-pack isn't really the problem (we can easily change it to accept bin crates), the tricky bits involve changing wasm-bindgen.

We know that bin crates can be supported as-is by wasm, because stdweb fully supports bin crates, no need to add in extra attributes or anything like that.

@boringcactus
Copy link
Contributor

boringcactus commented Oct 29, 2019

@Pauan i was interpreting

// This is sort of tricky, but the gist of it is that if there's a start
// function we want to defer execution of the start function until after
// all our module's exports are bound. That way we'll execute it as soon
// as we're ready, but the module's imports and such will be able to
// work as everything is wired up.
//
// This ends up helping out in situations such as:
//
// * The start function calls an imported function
// * That imported function in turn tries to access the wasm module
//
// If we don't do this then the second step won't work because the start
// function is automatically executed before the promise of
// instantiation resolves, meaning that we won't actually have anything
// bound for it to access.
//
// If we remove the start function here (via `unstart`) then we'll
// reexport it as `__wasm2es6js_start` so be manually executed here.
and
// Initialization is just flat out tricky and not something we
// understand super well. To try to handle various issues that have come
// up we always remove the `start` function if one is present. The JS
// bindings glue then manually calls the start function (if it was
// previously present).
as reasons that just using main() as-is doesn't work, but evidently stdweb resolves that somehow.

edit: that's probably mostly unrelated, stdweb resolves it by automatically treating any exported function called main as the entry point and then calling it from the generated JS: https://github.com/koute/cargo-web/blob/2ddd9bbb567b8d4893e923cba8ad03806d383d80/src/wasm_export_main.rs#L9-L27

@Pauan
Copy link
Contributor Author

Pauan commented Oct 29, 2019

@boringcactus Yes, the stdweb solution is reasonable: just treat an exported main function as if it had #[wasm_bindgen(start)] attached to it (even if it doesn't have it in the source code).

I'm not worried about the possible confusion with main in lib crates. I think it's quite rare to export a main function from a lib crate, and in any case they can workaround it with js_name if needed.

@boringcactus
Copy link
Contributor

i think matching based on name and also type signature, and only checking in the first place if no #[wasm_bindgen(start)] was already declared, is specific enough that incidental library coverage will be minimal. it wasn't actually all that much code that needed to be added, but figuring out what code to add and where was the hard part. #1843 is now ready for discussion.

@huangjj27
Copy link
Member

#[cfg(target_arch = "wasm32")]
#[wasm_bindgen(start)]
pub fn wasm_main() {
    main();
}

@boringcactus Yes, I want it. I'm currenly implementing web feature for env_logger, but I found out that I couldn't add wasm_bindgen as a target featured dependency of the binary, otherwise no functions would work. But if I used wasm_bindgen as a universal dependency, everything works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants