Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide clang flag to opt out of wasm-opt when linking wasm? #55781

Closed
TerrorJack opened this issue May 30, 2022 · 46 comments · Fixed by #95208
Closed

Provide clang flag to opt out of wasm-opt when linking wasm? #55781

TerrorJack opened this issue May 30, 2022 · 46 comments · Fixed by #95208
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' lld:wasm

Comments

@TerrorJack
Copy link

Currently, when linking wasm, clang attempts to detect wasm-opt in the environment, and invokes it with certain flags based on current optimization level. source

This behavior is undesirable when user wants to customize when or how wasm-opt is run. It would be nice if a clang command line flag can be provided to opt-out of this behavior.

@EugeneZelenko EugeneZelenko added clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' and removed new issue labels May 30, 2022
@llvmbot
Copy link
Member

llvmbot commented May 30, 2022

@llvm/issue-subscribers-clang-driver

@abrown
Copy link
Contributor

abrown commented Jan 10, 2023

cc: @sunfishcode, @sbc100, @tlively or anyone else who may have worked on this: what do you all think about retaining the name custom section by default in wasm-opt and only removing it when the optimization level is explicitly dialed up? This section is quite useful for populating backtraces, e.g., and I was confused when the name section magically disappeared in a build.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 10, 2023

I think the simplest thing would be to preserve it in all cases. If folks want to strip it they can use -Wl,--strip-debug which already exists today.

I think it might be as simple as adding -g unconditionally to wasm-opt (although that might end up add a name section to the binary where non existed previously... I think we should fix that on the wasm-opt side if that is the case).

@sbc100
Copy link
Collaborator

sbc100 commented Jan 10, 2023

In other words, let wasm-ld decide what debug into to include, and ask wasm-opt to do its best to preserve it.

@abrown
Copy link
Contributor

abrown commented Jan 11, 2023

Should I submit a change request to add -g unconditionally to the driver?

@aheejin
Copy link
Member

aheejin commented Jan 12, 2023

Adding -g to the driver invocation in the clang driver is one way.

I think another way might be to just, change binaryen's behavior so that even without -g given, we preserve the debug info. I checked with clang and llc and they preserve the existing debug info without -g given.

For example, if I create test.ll using

clang -g test.c -c -emit-llvm -o test.ll

And finish compiling test.ll to test.o using clang -g or llc,

llc -filetype-obj test.ll -o test.o
clang -c test.ll -o test.o

Even though the second llc and clang don't have -g, they preserve the debug info. But this changes binaryen's current behavior, so I can be unaware of potential consequences. Thoughts? @kripken

@kripken
Copy link
Member

kripken commented Jan 12, 2023

@aheejin Hmm, I think changing binaryen may be a little risky. It's a change to a long-existing user interface and the result could be people shipping name sections unnecessarily without noticing it (if they don't measure code size).

Also, there is a reason binaryen differs from llc - llc's inputs all know if they have debug info or not, so it's natural to preserve debug info as it was in the input. But wasm-opt can also receive as input a wat file which doesn't say if it has debug info or not, so we need user input to tell us whether to emit a names section. It made more sense in the past when binaryen had a second input format without clear debug info (asm2wasm / asm.js), so I guess it's debatable. And also clang has mixed behavior, requiring -g for source files but preserving debug info on object files automatically IIUC.

Still, for the reasons in the first paragraph I think it's better not to change binaryen here. Instead, could clang pass -g to wasm-opt when it received -g itself?

@abrown
Copy link
Contributor

abrown commented Jan 12, 2023

@kripken, I also was thinking along the lines of "conditionally pass -g along" but I couldn't figure out how this might work when wasm-ld is used directly. I don't see that it has a -g flag to pass along — were you thinking we should add one?

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2023

Its in the clang driver linked above:

// When optimizing, if wasm-opt is available, run it.

wasm-ld never calls wasm-opt itself.. its something that run after wasm-ld.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2023

Also, I think its has to be unconditional since wasm-ld will produce debug info based on input object files, and not based on the -g flag passed to clang at link time. In fact, I think clang ignores -g at link time.

Assuming wasm-opt -g means don't strip debug info, but also don't generate any then I think its fine to make it unconditional.

@kripken
Copy link
Member

kripken commented Jan 12, 2023

Assuming wasm-opt -g means don't strip debug info, but also don't generate any then I think its fine to make it unconditional.

Unfortunately that's not true. Binaryen IR doesn't mention the presence or lack of debug info. That is, having or not having a names section is something the wasm binary format has, but not the text format, and not Binaryen IR which abstracts over both the text and binary format. Atm wasm-opt -g will emit a name section using the names used in the IR (which, if there wasn't a names section, will be things like $0).

But if there isn't a better solution, then adding that might be ok. We already have hasFeaturesSection, and could add hasNamesSection + handling for it.

@kripken
Copy link
Member

kripken commented Jan 12, 2023

(But if we did that, we'd need to add a new flag for it, likely, because of the breaking change issue from before - we don't want wasm-opt to start or stop emitting the names section in new ways compared to what it did before.)

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2023

I new option sounds reasonable. Something like --preserver-debug-info... on the other hand we could make the default and add --strip-debug?

@kripken
Copy link
Member

kripken commented Jan 12, 2023

Looks like wasm-opt already has --strip-debug actually (which also strips the SourceMapURL and DWARF sections).

--preserve-debug-info sounds reasonable. (Or -g? 😉 )

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2023

The problem with -g is that it could mean "generate debug sections" even when the input file doesn't contain them... do we ever want wasm-opt to do that? Perhaps that should be the rare/opt-in thing?

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2023

OK, how about this: Why don't we do what wasm-ld does and preserve debug sections unless --strip-debug is passed? But only add new names section if -g is passed? No new option needed, no clang change needed.

@kripken
Copy link
Member

kripken commented Jan 12, 2023

The problem with -g is that it could mean "generate debug sections" even when the input file doesn't contain them... do we ever want wasm-opt to do that?

It's probably rare, but an example use case of generating debug info when there was none before is: wasm-opt --name-types -g which will give wasm GC types useful stable names. Without stable names, they can get reordered during opts and the indexes mean nothing, and debugging is a total mess.

Why don't we do what wasm-ld does and preserve debug sections unless --strip-debug is passed?

My main concern there is that it's a breaking change for users. They had an existing workflow that does not emit a names section, and now it does. This is worrying because names sections are silent bloat and the user might not realize they are starting to ship larger wasm files for no reason.

That idea might be simpler overall, but it's also riskier. in contrast I think --preserve-debug has few downsides.

@dschuff
Copy link
Member

dschuff commented Jan 12, 2023

The problem with -g is that it could mean "generate debug sections" even when the input file doesn't contain them... do we ever want wasm-opt to do that?

I can't think of any reason we want wasm-opt to do that.

It's probably rare, but an example use case of generating debug info when there was none before is: wasm-opt --name-types -g which will give wasm GC types useful stable names. Without stable names, they can get reordered during opts and the indexes mean nothing, and debugging is a total mess.

This involves the user passing an explicit flag though, so if --name-types means "generate new names and create a name section for them" then I think it's fine if -g without that flag always preserves but never generates new names. Probably in order for this to work we'd need to track which IR names came from the user and which were autogenerated by the parser, but that doesn't sound too terrible to do.

Why don't we do what wasm-ld does and preserve debug sections unless --strip-debug is passed?

My main concern there is that it's a breaking change for users. They had an existing workflow that does not emit a names section, and now it does. This is worrying because names sections are silent bloat and the user might not realize they are starting to ship larger wasm files for no reason.

That idea might be simpler overall, but it's also riskier. in contrast I think --preserve-debug has few downsides.

I agree that this isn't great. IMO it's slightly mitigated by the fact that wasm-opt users are expected to be toolchain developers who are driving wasm-opt with some kind of frontend (as opposed to "end" users developing apps). This means there are fewer direct users, and they will be updating their wasm-opt version explicitly, and (it seems to me) more likely to notice behavior changes than an end user (and maybe it's easier for us to warn them). But there's still some risk.

@kripken
Copy link
Member

kripken commented Jan 12, 2023

Fair point, yeah, it's mostly toolchain people that are relevant here. I agree that reduces the risk.

I don't feel super-strongly if everyone else disagrees with me. But I would prefer not to emit a name section by default in any mode, just because of that risk of silent bloat. I'd rather people need to opt-in to get something that has such a risk, using -g or a new --preserve-debug.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2023

Either of those sounds reasonable to me, as long as (one their own) they don't introduce a name section where there was none before.

ereslibre added a commit to ereslibre/webassembly-language-runtimes that referenced this issue Jan 17, 2023
By exporting the WASMLABS_SKIP_WASM_OPT envvar, the wasm-opt wrapper
present in the wasm-base container image will make the wasm-opt call a
no-op.

This is useful for cases when we don't desire wasm-opt to be called,
for example when invoked directly by
clang (llvm/llvm-project#55781).
ereslibre added a commit to ereslibre/webassembly-language-runtimes that referenced this issue Jan 17, 2023
By exporting the WASMLABS_SKIP_WASM_OPT envvar, the wasm-opt wrapper
present in the wasm-base container image will make the wasm-opt call a
no-op.

This is useful for cases when we don't desire wasm-opt to be called,
for example when invoked directly by
clang (llvm/llvm-project#55781).
ereslibre added a commit to ereslibre/webassembly-language-runtimes that referenced this issue Jan 17, 2023
By exporting the WASMLABS_SKIP_WASM_OPT envvar, the wasm-opt wrapper
present in the wasm-base container image will make the wasm-opt call a
no-op.

This is useful for cases when we don't desire wasm-opt to be called,
for example when invoked directly by
clang (llvm/llvm-project#55781).
ereslibre added a commit to ereslibre/webassembly-language-runtimes that referenced this issue Jan 17, 2023
By exporting the WASMLABS_SKIP_WASM_OPT envvar, the wasm-opt wrapper
present in the wasm-base container image will make the wasm-opt call a
no-op.

This is useful for cases when we don't desire wasm-opt to be called,
for example when invoked directly by
clang (llvm/llvm-project#55781).
@yamt
Copy link
Contributor

yamt commented Jan 27, 2023

IMO, it's simpler just to remove the automatic wasm-opt invocation.
as a user, my honest impression is that it's a rather surprising than useful. (it took a few hours for me to figure out that binaryen installation has silently affected my build.)
if a user wants to run wasm-opt, he can always do that by himself.

@pelletier
Copy link

This confused me today as well. I was inspecting the line number extraction of inlined calls in our profiler. The dwarf ranges didn't make sense, but they worked in our CI. Eventually, I realized wasm-opt was being run when on the path, which was the case on my machine, but not on our build container.

In my opinion, wasm-opt should never be invoked by the driver. If folks want to use it, they can set it up as part of their build system. Not being able to produce correct wasm modules with -O3 -g if wasm-opt happens to be on the PATH of the build machine makes things needlessly difficult.

@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2023

@llvm/issue-subscribers-lld-wasm

Author: Cheng Shao (TerrorJack)

Currently, when linking wasm, `clang` attempts to detect `wasm-opt` in the environment, and invokes it with certain flags based on current optimization level. [source](https://github.com/llvm/llvm-project/blob/940e290860894d274c986c88bea2dcc17f1e93c3/clang/lib/Driver/ToolChains/WebAssembly.cpp#L133)

This behavior is undesirable when user wants to customize when or how wasm-opt is run. It would be nice if a clang command line flag can be provided to opt-out of this behavior.

@TerrorJack
Copy link
Author

Sorry but it doesn't seem to me 89d5635 closes this ticket at all. --keep-section is not a way to opt-out the invocation of wasm-opt when -O is enabled!

@tlively tlively reopened this Nov 6, 2023
@tlively
Copy link
Collaborator

tlively commented Nov 6, 2023

@sbc100, did you mean to close this?

@RReverser
Copy link

This is also a problem with 3rd-party toolchains that don't properly embed target_features list but use clang for linking (looking at you, .NET).

Arguably they should fix this issue on their side but meanwhile it means that linking will fail as soon as optimisation is enabled with lots of errors like

[wasm-validator error in function 5096] unexpected false: Bulk memory operations require bulk memory [--enable-bulk-memory], on 
  (memory.copy
   (local.get $0)
   (local.get $1)
   (local.get $2)
  )
  [wasm-validator error in function 5097] unexpected false: Bulk memory operations require bulk memory [--enable-bulk-memory], on
  (memory.copy
   (local.get $0)
   (local.get $1)
   (local.get $2)
  )
...

Maybe wasm-opt could be at least executed with -all when doing so implicitly?

@tlively
Copy link
Collaborator

tlively commented Nov 15, 2023

-all wouldn't be quite right because it would allow wasm-opt to introduce new uses of features that were not intended to be allowed. If we want wasm-opt to be able to do the right thing without specifically enabling features on the command line and without a target features section, we'll need to add a new flag to wasm-opt that lets it infer the allowed features from the input module.

@RReverser
Copy link

RReverser commented Nov 15, 2023

-all wouldn't be quite right because it would allow wasm-opt to introduce new uses of features that were not intended to be allowed.

I thought it only affected validation? I didn't realise it also allows it to introduce new features.

If not, maybe -n (aka --no-validation) at least? Normally code produced by the linker should be valid anyway, so presumably validation isn't necessary, but it would allow cases like this one to pass.

@tlively
Copy link
Collaborator

tlively commented Nov 15, 2023

It's not just validation. The features affect what optimizations are run and what the optimizations are allowed to do as well.

Skipping validation might help, but without e.g. the sign-ext feature being set, wasm-opt will not perform optimizations that introduce sign extension instructions. I wouldn't be surprised if some optimizations hit assertion failures if the features don't match the module, too.

@mh4ck-Thales
Copy link
Contributor

I stumbled on this bug too. Having the behavior (and even the good working) of clang with frequently used flags such as -g or -O depend on the presence of wasm-opt on the machine, which is a binary unrelated to the LLVM project and not distributed with LLVM or LLVM-based toolchain such as wasi-sdk AFAIK, is a big problem.

I believe that there are two problems here, that may be separated in two issues if you find it relevant. The first one is that clang compiling to Wasm and calling wasm-opt breaks some flags / features (-g, -O, and others). This issue can be tracked in #64909. The second one is that clang can use or not wasm-opt without any warnings and produce potentially different outputs, or that one may want to use wasm-opt with specific flags on the "clang without wasm-opt" output, and that can't be achieved right now.

Regarding these two problems, I think that the easiest solution would be to remove the wasm-opt call for now, or at least put it behind an opt-in flag. But I don't know if that can be considered as a breaking change or a regression.

If removing the default call for wasm-opt is not possible, then add an opt-out flag so that the people having trouble with wasi-opt can just opt out without having to tweat their PATH or whatever. Also, add a warning / info message that tells the clang user if wasm-opt was used or not, and the exact path if so.

Once correct support for passing flags from clang to wasm-opt if implemented, and a way to distribute wasm-opt alongside LLVM (be it in wasi-sdk or somewhere else) is introduced, it can be added back as the default.

Finally, I'll put some questions here in order to better understand the situation around wasm-opt and maybe help taking future decisions:

  • Why add wasm-opt in the clang driver in the first place? Shouldn't this work be done by llvm-opt directly?
  • Is wasm-opt related to LLVM / a LLVM project in any way, or is it a completely external tool that was added in here for its interesting features?
  • Can we implement wasm-opt interesting features directly into LLVM?

@kripken
Copy link
Member

kripken commented Jan 25, 2024

@mh4ck-Thales

Is wasm-opt related to LLVM / a LLVM project in any way, or is it a completely external tool that was added in here for its interesting features?

wasm-opt is part of Binaryen and entirely separate from LLVM. It is an optimizer that can make wasm files smaller and faster (on LLVM output it typically helps by 15% or so).

Can we implement wasm-opt interesting features directly into LLVM?

In theory yes, but it would be a large amount of both initial and ongoing effort, so I doubt it. That effort can't be deduplicated because from the wasm side it makes sense to add new optimizations in wasm-opt which is a standalone, modular tool that any toolchain can use (not only LLVM), which benefits the entire wasm ecosystem.

(Personally I agree with you on the main matter here that the current clang behavior is surprising, but I defer to clang people on how to fix that.)

@mh4ck-Thales
Copy link
Contributor

@kripken thanks for the clarifications. I understand that technically, wasm-opt is a great addition to the clang workflow and greatly improves the final Wasm binary. I also get that most of the work addressing Wasm-specific optimizations are directly towards wasm-opt instead of the internal LLVM optimizer (which makes sense). This builds towards the argument of keeping wasm-opt and finding a way to distribute it alongside LLVM, as a third-party but necessary dependency.

I'm curious though to know if there is any kind of overlap between LLVM optimization features and the ones of wasm-opt, and if that's the case is there an interest to try and avoid the overlap somehow to save time on big binaries (and if the benefit is worth the trouble).

I hope that we can all agree on a solution soon, and accept a patch accordingly.

qihangkong pushed a commit to rvgpu/llvm that referenced this issue Apr 18, 2024
This flag causes wasm-ld preserve a section even in the face of
`--strip-all`.  This is useful, for example, to preserve the
target_features section in the ase of clang (which can run wasm-opt
after linking), and emcc (which performs a bunch of post-link work).

Fixes: llvm/llvm-project#60613
Fixes: llvm/llvm-project#55781

Differential Revision: https://reviews.llvm.org/D149917
@whitequark
Copy link
Collaborator

I've hit this issue a number of times recently. The two problematic outcomes for me were:

  1. wasm-opt corrupts DWARF debug information, therefore making the backtraces useless.
  2. Even on a non-LTO build, wasm-opt can take 10-20 minutes to complete on a recent laptop building for example Clang, which makes incremental workflows impossible.

@sunfishcode
Copy link
Member

I'm also in favor of having a flag to disable wasm-opt. If someone could submit a patch, I'd review it.

mh4ck-Thales added a commit to ThalesGroup/llvm-project that referenced this issue Jun 11, 2024
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this issue Jul 9, 2024
This PR fixes llvm#55781 by adding the `--no-wasm-opt` and `--wasm-opt`
flags in clang to disable/enable the `wasm-opt` optimizations. The
default is to enable `wasm-opt` as before in order to not break existing
workflows.

I think that adding a warning when no flag or the `--wasm-opt` flag is
given but `wasm-opt` wasn't found in the path may be relevant here. It
allows people using `wasm-opt` to be aware of if it have been used on
their produced binary or not. The only downside I see to this is that
people already using the toolchain with the `-O` and `-Werror` flags but
without `wasm-opt` in the path will see their toolchain break (with an
easy fix: either adding `--no-wasm-opt` or add `wasm-opt` to the path).
I haven't implemented this here because I haven't figured out how to add
such a warning, and I don't know if this warning should be added here or
in another PR.

CC @sunfishcode that proposed in the associated issue to review this
patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' lld:wasm
Projects
None yet
Development

Successfully merging a pull request may close this issue.