Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwrap emcc wrapper script to its underlying clang invocation #154

Merged
merged 11 commits into from
Jan 4, 2024

Conversation

kon72
Copy link
Contributor

@kon72 kon72 commented Dec 31, 2023

This change unwraps emcc.sh to its underlying clang invocation.
This is done by running emcc.sh with an environment variable EM_COMPILER_WRAPPER set to a fake wrapper script that just prints the command line arguments passed to it and exits.

Diff of compile_commands.json before and after this change

Before this change:

{
  "file": "a.cc",
  "arguments": [
    "external/emsdk/emscripten_toolchain/emcc.bat",
    "-xc++",
    "-isysrootexternal/emscripten_bin_win/emscripten/cache/sysroot",
    "-fdiagnostics-color",
    "-fno-exceptions",
    "-fno-strict-aliasing",
    "-funsigned-char",
    "-no-canonical-prefixes",
    "-std=gnu++17",
    "-nostdinc",
    "-nostdinc++",
    "-fomit-frame-pointer",
    "-O0",
    "-Wall",
    "-DBAZEL_CURRENT_REPOSITORY=\"\"",
    "-iquote",
    ".",
    "-iquote",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin",
    "-iquote",
    "external/emsdk",
    "-iquote",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/external/emsdk",
    "-iwithsysroot/include/c++/v1",
    "-iwithsysroot/include/compat",
    "-iwithsysroot/include",
    "-isystem",
    "external/emscripten_bin_win/lib/clang/18/include",
    "-MD",
    "-MF",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.d",
    "-c",
    "a.cc",
    "-o",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.o",
    "-Wno-builtin-macro-redefined",
    "-D__DATE__=\"redacted\"",
    "-D__TIMESTAMP__=\"redacted\"",
    "-D__TIME__=\"redacted\"",
    "-Werror"
  ],
  "directory": "C:/users/kinse/proj/wasm_bazel"
}

After this change:

{
  "file": "a.cc",
  "arguments": [
    "external/emscripten_bin_win/bin/clang++.exe",
    "-xc++",
    "-target",
    "wasm32-unknown-emscripten",
    "-fignore-exceptions",
    "-fvisibility=default",
    "-mllvm",
    "-combiner-global-alias-analysis=false",
    "-mllvm",
    "-enable-emscripten-sjlj",
    "-mllvm",
    "-disable-lsr",
    "-DEMSCRIPTEN",
    "-isysrootexternal/emscripten_bin_win/emscripten/cache/sysroot",
    "-fdiagnostics-color",
    "-fno-exceptions",
    "-fno-strict-aliasing",
    "-funsigned-char",
    "-no-canonical-prefixes",
    "-std=gnu++17",
    "-nostdinc",
    "-nostdinc++",
    "-fomit-frame-pointer",
    "-O0",
    "-Wall",
    "-DBAZEL_CURRENT_REPOSITORY=\"\"",
    "-iquote",
    ".",
    "-iquote",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin",
    "-iquote",
    "external/emsdk",
    "-iquote",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/external/emsdk",
    "-iwithsysroot/include/c++/v1",
    "-iwithsysroot/include/compat",
    "-iwithsysroot/include",
    "-isystem",
    "external/emscripten_bin_win/lib/clang/18/include",
    "-MD",
    "-MF",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.d",
    "-c",
    "-Wno-builtin-macro-redefined",
    "-D__DATE__=\"redacted\"",
    "-D__TIMESTAMP__=\"redacted\"",
    "-D__TIME__=\"redacted\"",
    "-Werror",
    "a.cc",
    "-o",
    "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.o"
  ],
  "directory": "C:/users/kinse/proj/wasm_bazel"
}

in case they change extension or add suffixes in the future
@kon72, if I've missed something here, please just revert
I was thinking that in the future _all_platform_patch might modify things revealed by them
@cpsauer
Copy link
Contributor

cpsauer commented Dec 31, 2023

Hi, Kon! Thanks for a exceptionally thoughtful contribution here and for being the type of person who leaves things better than you found them :)

This is very clever and I'm definitely down to merge. Some quick thoughts first:

  • What do you think about printing out the args with a Bazel-built cc_binary, being passed in as data instead of the shell scripts wrapping python?
    • I'm thinking about this, trying to avoid direct system python invocations, because we may soon move to Bazel hermetic python based on user requests.
    • Some other options would be (1) just echoing from platform-specific shell scripts (2) similar to cc_binary but with py_binary.
    • If you've already thought about this and it doesn't work or is worse for some reason, just let me know.
  • Thanks for taking the time to clean up unused code, style things nicely, and otherwise keeping the quality bar super high. I really appreciate it.
  • EMCC Environment construction: If you haven't already, could I ask you to check that Bazel still isn't providing these environment variables in the aquery output?
    • That is, run something like bazel aquery "mnemonic('CppCompile',deps(<your target>))" and make sure that the "Environment:" line doesn't include the things you want?
    • If it does, let's use those. If not, could I ask you to lend your support over at SDKROOT, DEVELOPER_DIR, INCLUDE, LIB missing from Environment Variables in Aquery and Extra Actions bazelbuild/bazel#12852? Would make sense to ask them to make sure this gets fixed as part of starlarkification of the cc rules.
    • I think it should include PATH, at least -> I'd love it if you'd get the environment from the aquery output and use that in (all) the subprocess.run()s. Otherwise we'll be inheriting the user's shell environment rather than the bazel run environment, which can meaningfully differ. (I'm realizing reading through that I'd messed this up previously, and that you're copying the same mistake I'd made; still hoping I can ask you to fix while you're down in there.)
  • I pushed up a some quick small things, often fixing pre-existing, but related issues I noticed while looking through. I'd love it if you'd take a peek--and if I've broken anything (sorry) feel free to just fix if so. Thanks again for making our emscripten support so much better.
  • For my continued learning: Could I ask you what user needs this fixes? What errors were you seeing in clangd without? (I can imagine...but it's be good to hear explicitly.)

Thanks again for an outstanding contribution :) Can't wait to get you on the contributors list!

Chris

refresh.template.py Outdated Show resolved Hide resolved
assert sysroot, f'Emscripten sysroot not detected in CMD: {compile_args}'

def get_workspace_root(path_from_execroot: pathlib.PurePath):
assert path_from_execroot.parts[0] == 'external'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor, but I suppose this might not be external if invoked within the emsdk...would we want to return . rather than assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I fixed it.

@kon72
Copy link
Contributor Author

kon72 commented Jan 1, 2024

Hi Chris, thanks for the quick reply!

What do you think about printing out the args with a Bazel-built cc_binary, being passed in as data instead of the shell scripts wrapping python?

I think cc_binary is a good approach here.
I originally tried to implement it with a pure shell script, but I couldn't find a reliable way to print out the command line arguments on a Windows batch file when they contain quotes. (As I'm not that familiar with .bat, I would be happy to hear if anyone knows how to do it.)
Then I decided to use shell script plus python as emsdk and emscripten does, but as you pointed out, it prohibits users from using the hermetic python.
For a pure py_binary approach, I'm not sure how to make print_args.py get called from emcc process. I'd like to ask you if it's possible to do it when EM_COMPILER_WRAPPER can only take one argument?
Edit: I was able to make it work by simply setting _print_args_executable to py_binary rule. That said, cc_binary would be better in terms of speed.

EMCC Environment construction: If you haven't already, could I ask you to check that Bazel still isn't providing these environment variables in the aquery output?

Wow, bazel aquery --output=jsonproto "mnemonic('CppCompile',deps(<your target>))" gave me environment variables necessary to run emcc! (omitting --output won't)
However, it seems this would not work prior to Bazel 6.1.0 (see bazelbuild/bazel#17274, I tested on 6.0.0 and 6.1.0), so I'd want to implement in a way that: first try to find env vars from aquery output, and if not found, falls back to the current way of constructing those variables using get_workspace_root.
Or perhaps can we drop support of that versions? I'd like to kindly ask your thoughts on it.

I think it should include PATH, at least

Unfortunately, aquery output doesn't have PATH, it only has PWD and EM_* things:

"environmentVariables": [{
  "key": "PWD",
  "value": "/proc/self/cwd"
}, {
  "key": "EM_BIN_PATH",
  "value": "external/emscripten_bin_mac_arm64"
}, {
  "key": "EM_CONFIG_PATH",
  "value": "external/emsdk/emscripten_toolchain/emscripten_config"
}, {
  "key": "EMCC_WASM_BACKEND",
  "value": "1"
}],

Could I ask you what user needs this fixes? What errors were you seeing in clangd without? (I can imagine...but it's be good to hear explicitly.)

emcc invokes clang with a bunch of extra flags such as -target wasm32-unknown-emscripten and -D__EMSCRIPTEN_SHARED_MEMORY__=1. Specifically, when clang sees -target wasm32-unknown-emscripten, it internally adds the definition of -D__EMSCRIPTEN__ which is used in lots of source code.
Those flags are necessary for clangd or other toolings to analyze the source code with the same commands used for building artifacts, without manual clangd configuration of adding missing compile flags.
This fixes also correctly handles the emcc specific flags like -sUSE_PTHREADS, fixing
the following error messages while generating compile_commands.json:
clang: error: no such file or directory: 'USE_PTHREADS=1'

@cpsauer
Copy link
Contributor

cpsauer commented Jan 2, 2024

Great work! Thank you again.

Was about to comment back that py_binary auto-wrapped in a script (based on the alert email) but you beat me to it. cc_binary seems great though. Love it.

Minor and nitpicky, but would you be down do use .cpp? (I'm partial only because .hpp is worth it, to distinguish from C, and then .cpp matches.)

Sorry about requiring the revert; and thanks for the note so folks don't edit it out in the future. Really appreciate your testing on Windows, too, and being so thoughtful about the quotes. (Curious: Do you have a VM or a separate machine? I switch to some old desktops when I need to test Windows things :)

Similarly, way to test the bazel versions and output options! We probably should support ideally, but I'm very tempted to drop and ok if you want to. To explain why I'm torn: we did get an issue filed when we required 6.1 (see backlink above) but IMO older bazel versions are buggy enough that we should really encourage updating and not contort ourselves too much. What do you think? If we do back support, might be worth calling bazel version so we can also handle that other, backlinked case.

PATH: Mine does on other actions. Hmm. Could you try adding specifying build --incompatible_strict_action_env to your .bazelrc and seeing if that adds PATH? I think that's what adds it; does for me. Loads of folks turn that on, because it avoids loads of needless cache misses from different terminals, IDEs, machines, etc. (Should be the default IMO; they've tried to make it so in the past.) I think the move is to use it if available and otherwise fall back to the environment. Thanks again for your care here.

And thanks so much for the context around the errors. More generally, I really appreciate how great you are to work with and all you've done here, Kon!

Chris

@cpsauer
Copy link
Contributor

cpsauer commented Jan 3, 2024

Reading your changes quickly and smiling :) Thanks again for being so great, Kon.
Looks like we're nearly there--but I don't want to presume without hearing back from you on the above. I'll toss up some quick twiddles for your consideration

Mostly adding environment dict to the compile_action for easy reuse in the future
@kon72
Copy link
Contributor Author

kon72 commented Jan 3, 2024

Regarding the bazel version, I personally think it's ok to require >= 6.1.0 since our projects doesn't depend on Bazel < 6.1.0. Feel free to let me know if it needs to support an older version later on :)

Could you try adding specifying build --incompatible_strict_action_env to your .bazelrc and seeing if that adds PATH?

Thanks, I confirmed that adds PATH. I pushed a commit to use that PATH if available.

Curious: Do you have a VM or a separate machine?

I have a separate Windows machine and do some tests on it through SSH from my Mac laptop.

@cpsauer
Copy link
Contributor

cpsauer commented Jan 4, 2024

Sweet! Seems like you're ready to merge, then! (Even if you haven't said so explicitly). Let's get you on that contributor board! (If I've been too hasty, well, just holler here or put up another PR.)

Thanks again for being great to work with, producing such quality code, and leaving things undeniably much better than you found them. It was a real pleasure working with you.

Cheers!
Chris

@cpsauer cpsauer merged commit b998dca into hedronvision:main Jan 4, 2024
2 checks passed
@doupongzeng
Copy link

Regarding the bazel version, I personally think it's ok to require >= 6.1.0 since our projects doesn't depend on Bazel < 6.1.0. Feel free to let me know if it needs to support an older version later on :)

Could you try adding specifying build --incompatible_strict_action_env to your .bazelrc and seeing if that adds PATH?

Thanks, I confirmed that adds PATH. I pushed a commit to use that PATH if available.

Curious: Do you have a VM or a separate machine?

I have a separate Windows machine and do some tests on it through SSH from my Mac laptop.

Hello, Kon, after seeing your update, it's quite impressive. Would you consider making it backwards compatible? In order to ensure smoother usability in the future, I am currently using Bazel 5.1, and have encountered some version-related issues as indicated here. --host_features=compiler_param_file need bazel >= 6.1.0

@kon72
Copy link
Contributor Author

kon72 commented Jan 5, 2024

Thank you so much for your kind words! I'm grateful for the opportunity to work with someone as experienced and supportive as you.

Best regards,
Kon

@kon72
Copy link
Contributor Author

kon72 commented Jan 5, 2024

Hello @doupongzeng,

I appreciate your suggestion regarding backward compatibility.
I'm going to file another PR that makes this change work with Bazel 5.x.

kon72 added a commit to kon72/bazel-compile-commands-extractor that referenced this pull request Jan 7, 2024
…ey're not supplied from aquery output

In Bazel < 6.1.0, aquery doesn't provide environment variables.
See hedronvision#154 for more context.
@kon72 kon72 changed the title Get the arguments emcc would pass to clang Unwrap emcc wrapper script to its underlying clang invocation May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants