Skip to content

Reduce CppInterOp Emscripten shared library size #655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mcbarton
Copy link
Collaborator

@mcbarton mcbarton commented Jul 1, 2025

Description

Please include a summary of changes, motivation and context for this PR.

@vgvassilev This change locally reduced the Emscripten shared library size to 47/48 Mb (depends of whether you use dir -lh or du -lh). It still passed all tests and xeus-cpp still could run all the notebook cells it could before (exception throwing is broken in CppInterOps deployment at the moment for some reason). I have cleared the cache of all llvm 20 Emscripten builds for this PR.

Fixes # (issue)

Type of change

Please tick all options which are relevant.

  • Bug fix
  • New feature
  • Requires documentation updates

Testing

Please describe the test(s) that you added and ran to verify your changes.

Checklist

  • I have read the contribution guide recently

@mcbarton mcbarton requested a review from vgvassilev July 1, 2025 14:42
Copy link

codecov bot commented Jul 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.52%. Comparing base (82f08c6) to head (b7aa806).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #655   +/-   ##
=======================================
  Coverage   79.52%   79.52%           
=======================================
  Files           9        9           
  Lines        3917     3917           
=======================================
  Hits         3115     3115           
  Misses        802      802           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@anutosh491
Copy link
Collaborator

It's a compile time optimization flag I think.

Do we need to force it like this everywhere or can we have cmake handle this ?

@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 1, 2025

It's a compile time optimization flag I think.

Do we need to force it like this everywhere or can we have cmake handle this ?

It is both a link and compile time optimisation. See https://emscripten.org/docs/tools_reference/emcc.html#emcc-oz .

@mcbarton mcbarton force-pushed the Reduce-Emscripten-shared-library-size branch from ee75d01 to d9f6630 Compare July 1, 2025 15:25
@anutosh491
Copy link
Collaborator

Yeah but we still don't want users to look into/pass flags correct during the build correct?

We should be able to have cmake handle this.

@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 1, 2025

Yeah but we still don't want users to look into/pass flags correct during the build correct?

We should be able to have cmake handle this.

I really don't understand the point your trying to make. Using EMCC_CFLAGS in this way to perfectly legitimate way of doing the Emscripten build. Emscripten is designed to be effected by modifying this flag (see https://emscripten.org/docs/tools_reference/emcc.html#environment-variables). If a user changes what we have in the documentation for this flag, and messes something up, then that is on them. I don't understand how putting it in a cmake option changes that.

@anutosh491
Copy link
Collaborator

My point is simple. We already have compile time and link time flags being passed through cmake, I don't see a need to pass this externally.

@vgvassilev
Copy link
Contributor

I think what @anutosh491 means is that we need to make this flag as part of our default compiler flags in the CMakeLists.txt.

@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 1, 2025

@vgvassilev @anutosh491 I first tried locally using -DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4 -Oz" -DCMAKE_C_FLAGS="-Oz" -DCMAKE_STATIC_LINKER_FLAGS="-Oz" (since I thought this might be the closest thing to EMCC_CFLAGS) and the llvm build crashed 15% into the build. I then tried -DCMAKE_CXX_FLAGS="-Dwait4=__syscall_wait4 -Oz" -DCMAKE_C_FLAGS="-Oz" which although I could build llvm with this option, had no effect on the size of CppInterOps shared library, or the llvm libraries. I do not know how to proceed now. If you have any other cmake options you would like me to try, let me know. If you don't, I think we should proceed using EMCC_CFLAGS.

@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 1, 2025

Putting a note here in case I forget. Using EMCC_CFLAGS="-flto" in the llvm build doesn't end up effecting the shared library size on its own. Used in conjuncation with Oz then the size falls slightly to 45/46 Mb.

@mcbarton mcbarton force-pushed the Reduce-Emscripten-shared-library-size branch from db280e0 to de7101c Compare July 2, 2025 10:38
@mcbarton mcbarton changed the title Use EMCC_CFLAGS="-Oz" during Emscripten llvm build to reduce CppInterOp shared library size to 48Mb Reduce CppInterOp Emscripten shared library size to 48Mb Jul 2, 2025
@mcbarton mcbarton changed the title Reduce CppInterOp Emscripten shared library size to 48Mb Reduce CppInterOp Emscripten shared library size Jul 2, 2025
@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 2, 2025

I have updated this PR to use the flto flag as well, and to compile+link CppInterOp shared library with these flags too. This reduces the shared library size to 44/45 Mb.

@mcbarton mcbarton force-pushed the Reduce-Emscripten-shared-library-size branch from 6869da8 to effdaf8 Compare July 7, 2025 19:44
@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 7, 2025

@vgvassilev I have now changed this PR to a cmake only solution, instead of using EMCC_CFLAGS. Hopefully that means this PR is ready to go in once the ci passes. Can you review and approve if your happy? The Emscripten cache will need deleting before merging this PR, so that the deployment gets the new smaller library.

@mcbarton mcbarton force-pushed the Reduce-Emscripten-shared-library-size branch from effdaf8 to ba6ed4c Compare July 7, 2025 19:55
@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 7, 2025

The failing job is because I accidentally deleted the wrong cache part way through the ci run. Will rerun tomorrow to show the failed job passing. Noticed that with this change, the Emscripten caches are twice the size they were before, so will need to delete Emscripten llvm 19 jobs for this to go in I think.

@anutosh491 anutosh491 self-requested a review July 8, 2025 07:42
@mcbarton
Copy link
Collaborator Author

@vgvassilev pinging for review

@vgvassilev
Copy link
Contributor

Can we add some check in the CI to fire when we break this. Maybe something that checks if the file is greater than 50 and fail the build?

@mcbarton mcbarton force-pushed the Reduce-Emscripten-shared-library-size branch 3 times, most recently from 586c2a5 to 3c29fb3 Compare July 11, 2025 15:00
@mcbarton
Copy link
Collaborator Author

Can we add some check in the CI to fire when we break this. Maybe something that checks if the file is greater than 50 and fail the build?

I have added a check, which on my local system will error if the size is greater than 46 Mb. Need to wait for ci to pass, to check I implemented correctly in the workflow file.

@anutosh491
Copy link
Collaborator

Hi, please refrain from merging this PR before a set of reviews on my end.

Emscripten-forge applies few optimization flags itself (similar to what pyodide does) to add a basic level of optimization for every package while building.

https://github.com/emscripten-forge/recipes/blob/e04b1067f9d7e4e56d742e5d3efa2131ed8a0315/recipes/recipes/emscripten_emscripten-wasm32/activate.sh#L50-L68

So if this goes in (and after a release if we build the latest cppinterop on emscripten-forge)
It's possible these flags go against each other/ overwrite each other.

@vgvassilev
Copy link
Contributor

Hi, please refrain from merging this PR before a set of reviews on my end.

Emscripten-forge applies few optimization flags itself (similar to what pyodide does) to add a basic level of optimization for every package while building.

https://github.com/emscripten-forge/recipes/blob/e04b1067f9d7e4e56d742e5d3efa2131ed8a0315/recipes/recipes/emscripten_emscripten-wasm32/activate.sh#L50-L68

So if this goes in (and after a release if we build the latest cppinterop on emscripten-forge) It's possible these flags go against each other/ overwrite each other.

I do not see how these flags can go against each other. We have tests that are running and if the tests are good I do not see a reason to hold things off. Right now we destroy people's network by forcing them to download 300mb, ~100 of which is CppInterOp. If this PR cuts the CppInterOp size by half while the tests still run this is something we need urgently, right?

@mcbarton mcbarton force-pushed the Reduce-Emscripten-shared-library-size branch from 85cdecc to c6f89c0 Compare July 12, 2025 11:08
@anutosh491
Copy link
Collaborator

anutosh491 commented Jul 16, 2025

Hey @mcbarton can you educate me with some questions

  1. What's the current size of libclangCppInterOp.so being built on main ? Guessing its somewhere around 70-80 MB

  2. Now as we see that CppInterop's deployment runs on iphone .... which means that the above size would suffice somehow, I am trying to understand what exactly is being optimized here ?

Cause according to my attempts
i) I see that using o2 gets it to 80 MB
ii) Using oz gets it to 73 MB
iii) flto doesn't have any strong affect
iv) even getting rid of the huge list of symbols we export isn't really having any strong affect on size.

So only making changes to cppinterop might not be good enough, possibly we won't even need to make changes on cppinterop to reduce its size .... which gets us to llvm.

Now I am trying to understand what exactly is being optimized through the above flags to get job done cause libclangInterpreter.a which we link against is anyways not too huge (412K on emscripten-forge) though the transitive dependencies being pulled in might be heavy (I see libclang-cpp.a is 147M)

@mcbarton
Copy link
Collaborator Author

Hey @mcbarton can you educate me with some questions

  1. What's the current size of libclangCppInterOp.so being built on main ? Guessing its somewhere around 70-80 MB
  2. Now as we see that CppInterop's deployment runs on iphone .... which means that the above size would suffice somehow, I am trying to understand what exactly is being optimized here ?

Cause according to my attempts i) I see that using o2 gets it to 80 MB ii) Using oz gets it to 73 MB iii) flto doesn't have any strong affect iv) even getting rid of the huge list of symbols we export isn't really having any strong affect on size.

So only making changes to cppinterop might not be good enough, possibly we won't even need to make changes on cppinterop to reduce its size .... which gets us to llvm.

Now I am trying to understand what exactly is being optimized through the above flags to get job done cause libclangInterpreter.a which we link against is anyways not too huge (412K on emscripten-forge) though the transitive dependencies being pulled in might be heavy (I see libclang-cpp.a is 147M)

Hi @anutosh491 on main the size of the shared library is 81 Mb. As far as what the Oz flag is doing under the hooe to reduce size, there doesn't seem to be much information online. The only thing I managed to work out for sure from my testing is that it decreases the inline-threshold flag to near 0. From my recollection, using just the Oz flag during the llvm build (I think using Os reduced it to 60 something), will increase the size of the llvm static libraries, but reduce the size of the shared library to 48 Mb. Adding the flto flag after that reduced the llvm build will reduce its size to 46 Mb. Adding both of these also to CppInterOps compile and link optione reduced it down to 45 Mb.

The llvm build is definitely the weak link, but we shouldn't ignore any savings in size we can make on the CppInterOp side too. Of what is left I know that llvm having assertions on is responsible for another 4-5 Mb (but turning them off introduces warnings about unused variables in CppInterOp. @vgvassilev CppInterOps tests can pass without them, so in may be worth having a no assertions deployment alongside an assertions one for debugging issues in a future PR ). Using flto with another 2 flags (I can't remember which ones off the top of my head, and which I haven't currently tried out separately) get us another 1 Mb. I didn't include these 2 flags so far, as I remember not understanding what they were doing from the clang flag descriptions.

One possible route forward to further reduce the size is to switch to side_module=2 , where we control what ends up in the shared library (rather than the default which is basically include everything I think). This idea would require quite a lot of thinking though, and shouldn't be part of this PR.

@mcbarton mcbarton force-pushed the Reduce-Emscripten-shared-library-size branch from 4e716da to 22421d2 Compare July 20, 2025 20:09
@mcbarton
Copy link
Collaborator Author

mcbarton commented Jul 20, 2025

@vgvassilev @anutosh491 If I build locally according to the instructions in the readme of this PR, this is the deployment I get https://mcbarton.github.io/xeus-cpp-demo/lab/index.html. Everything which works for me when I follow mains Emscripten instructions works in this deployment too.

The 'SIMD Acceleration through WebAssembly' example doesn't work in the link (fails to find wasm_simd128.h), but it doesn't work locally if you build according to the instructions on main (e.g. no modified llvm or CppInterOp). That is the reason I have opened this issue #680

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants