Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up community build bot for Linux/Aarch64 builds #547

Closed
bvibber opened this issue Jul 11, 2020 · 90 comments
Closed

Set up community build bot for Linux/Aarch64 builds #547

bvibber opened this issue Jul 11, 2020 · 90 comments

Comments

@bvibber
Copy link
Contributor

bvibber commented Jul 11, 2020

Per recent email list discussion I've proposed setting up a community-run build bot for Linux/Aarch64 (ARM64) builds of LLVM + binaryen that can be downloaded by emsdk for direct installation on compatible machines.

(This is a niche platform today but we can expect it to become big as Apple is planning to convert their entire Mac line over to ARM64 CPUs in the next two years. Macs will eventually want native macOS/ARM64 builds too but Linux/ARM64 builds will also be used there in things like Docker or other virtualized Linux environments too, and it's easier to bootstrap now with easy availability of Linux server VMs)

I'm seeing a small regression in building with emsdk on ARM64 that I'll check over, then I'll see if I can build using the same tool that's used for the CI builds that emsdk downloads and make sure we've got something compatible.

@kripken
Copy link
Member

kripken commented Jul 15, 2020

Talking to @Brion , wasm2c may be able to help here. While we don't like the idea of end users needing to build C builds, if the main infra produces C builds then those could be built by @Brion on an ARM machine, which as a single C file is much simpler than LLVM with the whole build system there. And then those builds would be uploaded, so end users get fully compiled builds, and wasm2c is just a step in the middle.

This will still have some downsides (like there is no simd or pthreads in wasm2c builds atm), so it would be best to get normal builds up. But I think it's worth seeing about wasm2c builds on the main bots - just one more build target, and it doesn't need a special machine to do so (the fast Linux machines can do it).

@sbc100
Copy link
Collaborator

sbc100 commented Jul 15, 2020

The problem not that that target machine can't build llvm from source (its full linux/arm installation). He can build the whole SDK just fine. The problem @Brion wants to solve is having a pre-built set of binaries uploaded by the bots. So it mostly about infa maintenance. Converting to wasm2c doesn't seem to solve this specific problem AFAICT.

@kripken
Copy link
Member

kripken commented Jul 15, 2020

@sbc100 See https://twitter.com/brionv/status/1283165113248133121 and earlier tweets right before it - my understanding is that there are some difficulties with building from source.

It would be much easier to just build a C file - clang clang.c and that's it.

@sbc100
Copy link
Collaborator

sbc100 commented Jul 15, 2020

Bit wait, I though the original discussion for this issue came of out request to have pre-built binaries because building from source was too slow? Am I remembering wrong?

If the problem is "I can't build from source on arm linux", then that seems like something that should be solvable. I'm sure llvm itself must be buildable on arm linux.

@kripken
Copy link
Member

kripken commented Jul 16, 2020

Maybe @Brion you can elaborate on the specific issues you ran into, that you referred to in the tweet?

In general, I think it is useful to make things simpler. If we emit a C build that makes setting up community bots as easy as "create a VM, clang clang.c, clang wasm-opt.c etc." then that's helpful - no build systems, no need to install anything. As we have more platforms (not just ARM Linux but also ARM MacOS etc.) the easier we make this the more helpful it will be.

@bvibber
Copy link
Contributor Author

bvibber commented Jul 16, 2020

The original problem is roughly:

  • on Linux/arm64, ready-to-run release binaries cannot be installed with emsdk, requiring slow memory and disk-intensive local compilation of LLVM and binaryen before one is able to use emcc

Proposed solution is:

  • provide community-buillt Linux/arm64 binaries alongside the official Linux/x86_64 and other binaries, so emsdk can download and install them without local compilation

There seem to be two ways to compile emscripten's dependencies:

  • compile through emsdk (not used by the production releases, so may not be tested or reliable or compatible with binary downloads)
  • compile through waterfall (used by the production releases, so is tested and reliable and will produce builds configured the same way as the release builds)

I've previously done patches to emsdk to allow building on Linux/arm & Linux/arm64 so once I fixed a small recent regression, building with emsdk does seem to work. However I'm not certain if it's the recommended way to be building for release packaging.

I made an attempt to build with waterfall as well, however quickly discovered that it hardcodes x86_64 as the architecture to build for, so when it fetches prebuilt dependencies like cmake or virtualenv they fail to execute on a Linux/arm64 system.

So there are three ways forward I think:

  1. go ahead and use emsdk as a meta-build tool, at the risk of divergence from the release builds
  2. modify waterfall to be aware of multiple architectures and either download native binaries for deps or require them to be installed in the environment
  3. modify waterfall to produce a wasm2c build of all the output binaries and then on the community-run build bots do an intermediate stage build from the C source files into native Linux/arm64 (etc) binaries

There would additionally need to be agreement on how to store & distribute the community binaries and change emsdk to point at them.

I hope this clarifies what I'm thinking. Thanks!

@sbc100
Copy link
Collaborator

sbc100 commented Jul 16, 2020

I think (2) would b easiest of all those options (and depending on the system tools seeems find to me).

Side note: I'm not sure what you mean by virtualenv.. as far as I know we don't use that in the waterfall. Do do download cmake binaries though.

I wish (1) was easier and it make be pretty sad that we have two completely different ways to build the SDK and the results look quite different. I would actually love to go back to shipping separate packages for llvm, binarien and emscripten like the emsdk does, but its fair bit of work, and probably not worth it right now.

@CarloCattano
Copy link

CarloCattano commented Feb 7, 2022

Also unable to build emscripten on pi 64 bit

Linux raspberrypi 5.10.63-v8+ #1496 SMP PREEMPT Wed Dec 1 15:59:46 GMT 2021 aarch64 GNU/Linux
./emsdk install 2.0.33
Resolving SDK version '2.0.33' to 'sdk-releases-upstream-cef8850d57278271766fb2163eebcb07354018e7-64bit'
error: tool or SDK not found: 'sdk-releases-upstream-cef8850d57278271766fb2163eebcb07354018e7-64bit'

@otterley
Copy link

otterley commented Mar 16, 2022

@sbc100 @Brion @kripken Hi! I'm with the Graviton team at AWS. We'd like to provide resources to the project so that it can be built on arm64. I'm happy to carry water and chop wood as needed.

What's the best way to get started?

You can also reach me at my email address, is the first 4 letters of my last name (fisc), followed by the first 2 letters of my first name (mi), at amazon.com.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2022

Awesome! Thanks for the offer to help out.

As far as I understand these are roughly the steps that need to happen:

  1. Setup and arm64 linux machine that is somehow triggered by commits to https://chromium.googlesource.com/emscripten-releases/
  2. On each commit run the ./src/build.py script from that repo
  3. Upload the results to some publicly accessible storage bucket
  4. Point emsdk at the uploaded assets

For (2) you might need to do some hacking to get the ./src/build.py to run on your infra

For (4) we might need/want to modify the emsdk/scripts/create_release.py script, or create a separate process for injecting the arm64 binaries into the emscripten-releases-tags.json file.

@Brion does that sounds about right to you?

@otterley
Copy link

The build.py script appears to download a number of prebuilt binaries including CMake, Node.JS, and Java from https://wasm.storage.googleapis.com/ -- I assume this is to ensure reproducible builds vs. relying on whatever the distro may provide. These prebuilt binaries don't seem to be available for Linux/arm64, so the script terminates quickly.

  1. Is this strictly required, as opposed to, say, using a Docker image?
  2. If it's strictly required, how do we make those prerequisites available?

@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2022

We could modify the build.py such that it assume the presence of system installs of all of those things.

We run this script on chromium infrastruce over which we have limited control so its usefull for us that it is hermetic in the way. I don't believe using docker is an option for us, and also need to continue to run on both macOS and windows.

But modifying the script to also run on your infrastructure is fine... including avoiding those specific versions of the tools.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2022

BTW, would you be prepared to setup and instance that emscripten/emsdk developers could have SSH access too?

@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2022

(BTW I think we no longer need the Java install at all: https://chromium-review.googlesource.com/c/emscripten-releases/+/3529671)

@otterley
Copy link

modifying the script to also run on your infrastructure is fine... including avoiding those specific versions of the tools.

👍🏻

BTW, would you be prepared to setup and instance that emscripten/emsdk developers could have SSH access too?

I think we can get you remote access somehow. Not sure whether that'll be via SSH or SSM Session Manager, but some way.

@dschuff
Copy link
Member

dschuff commented Mar 17, 2022

The other option here could be to do a cross-build, as we do for MacOS. If you have a Linux ARM64 sysroot, then an Intel build of clang that has the ARM64 backend enabled should be able to do cross-builds. That would mean that build.py wouldn't need to be updated.

Having said that, I don't think updating build.py would be all that hard, and we'd be happy to take patches.
There are 2 ways you might go about it:

  1. Use the same tool (depot_tools, specifically gclient) we and Chrome use to sync the build dependencies and sources. My guess is that it would probably actually work on ARM64 Linux since I know that it works at least on ARM32 Linux and ARM64 macOS (and it's all just Python). In that case you could modify the logic for syncing prebuilt CMake and node to point to your ARM64 linux versions (If those prebuilts exist, I could probably just add them to our storage bucket).

  2. Handle syncing the sources yourself (DEPS files are easy to parse), and fix build.py to use CMake and node from the system rather than prebuilt.

Either way I think you want to bypass all the logic that tries to use Chromium's build package (clang, SDKs, etc) since I'm not sure they support Arm64 linux. build.py has an option --no-host-clang that does this already (although since we don't test it much, it's possible that it has issues you'd need to fix). You probably also want --no-sysroot since you probably don't want to use Chrome's build sysroot.

Once you have the build dependencies and sources, you can build:
src/build.py --build-include=llvm,binaryen,emscripten

@sbc100
Copy link
Collaborator

sbc100 commented Mar 17, 2022

I think given that we have some arm64 linux hardware available we might as well do a native build.. then we can run some tests too.

Yes, I should have mentioned that. You will need to use gclient to sync the emscripten-releases repo.. that is how you get all the dependencies (similar to git submodules).

@dschuff
Copy link
Member

dschuff commented Mar 17, 2022

I updated my comment above. I'm not really sure whether gclient or manual sync would be easier. It might depend on whether they want to depend on our build deps or not.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 17, 2022

Given that we are proposing to using these binaries in emsdk releases I think it would be confusing to do anything but follow the DEPS. In fact I think its a pretty hard requirement. Version X of emsdk should produce the same output whatever platform one runs on.

@dschuff
Copy link
Member

dschuff commented Mar 17, 2022

Yes, we have to follow the DEPS. It's just a matter of whether they want to run gclient to do that, or do it manually.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 17, 2022

Right, I guess may folks are not familiar with depot_tools or gclient. The basic steps are:

  1. Install depot_tools in your PATH: https://www.chromium.org/developers/how-tos/install-depot-tools/
  2. mkdir && cd
  3. gclient config https://chromium.googlesource.com/emscripten-releases
  4. gclient sync

Now you have emscripten-releases checked out along with all of its dependencies. Each time you modify DEPS to checkout a release with different DEPS you would re-run gclient sync to keep the deps in sync.

@danieloneill
Copy link

Coming up on the 3 year anniversary of this issue and it still doesn't seem fully resolved. What is needed (or what could a motivated contributor do) to ensure builds for each release are available for aarch64 in a timely manner? At the time of posting, the latest available release is 3.1.33 which isn't too far behind 3.1.41, but far enough to feel second-class. Is this an issue of builds for releases failing for aarch64 on the buildserver?

@reinhrst
Copy link

reinhrst commented Nov 4, 2023

For people coming here hoping to find a Arm64 Docker image, I'm happy to share this gist

@jamsinclair
Copy link
Contributor

jamsinclair commented Jan 7, 2024

Hey @otterley, if you're still working in related areas, is there any flexibility to increase the frequency of the emscripten arm64 builds on AWS? I understand this is entirely thanks to your (or Amazon's) generosity, so I understand if the bandwidth isn't there.

I'm sure many of us would be super thankful for more regular builds for arm64 🙇


Also on this topic, now that CircleCI has arm64 machines that can be easily used in pipelines, would it be too resource intensive to build and upload the binaries from the project itself? (Either here or over in emscripten-core/emscripten). This could allow for more seamless releases and easy multi-arch docker builds.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 7, 2024

I think the problem is that our current builders (the ones that actually produce the SDK binaries) run on google chrome infrastructure, and they don't current support arm64 machines: https://ci.chromium.org/p/emscripten-releases/g/main/console.

To have some releases build by github/circleci we would likely need to redesign the whole system. The simplest way might be to setup an automatic mirror of the emscripten-releases repository hosted on github.

@dschuff
Copy link
Member

dschuff commented Jan 8, 2024

Another possibility might be if we could cross-build ARM64 binaries on our x86-64 Chromium builder. I think that would maybe just require a sysroot?

@jamsinclair
Copy link
Contributor

jamsinclair commented Mar 24, 2024

Just a heads up it seems like the public S3 bucket that the linux arm64 EC2 builds were being shared from has either been made private or existing assets have expired.

I guess this may mean that there are no longer any automated builds for the foreseeable future?

@otterley
Copy link

Hi @jamsinclair, I had to disable public access to the S3 bucket. It was meant to share assets only with the Emscripten build team to place in their official repository, not to share assets with the public.

The current status of aarch64 support is that the last build created was the last one that worked. There were subsequent changes to various dependencies that broke the build, and I haven't had the spare cycles to work with the Emscripten teams to get it working.

Ideally the Emscripten team can produce a fully-working aarch64/arm64 build on their own. Unfortunately I don't have the resources right now to contribute to its ongoing maintenance.

@dschuff
Copy link
Member

dschuff commented Mar 26, 2024

I'm experimenting with a cross-build for aarch64-linux based on Chrome's aarch64-linux sysroot. It's straightforward to create, but I have no way to test it right now. If it works, maybe we can check the code in, and someone can use it to build and run their own tests, or we can make it an unofficial bot (or set up some kind of circleCI testing on the emsdk side).

@dschuff
Copy link
Member

dschuff commented Mar 28, 2024

I've uploaded an experimental aarch64-linux build using Chromium's sysroot at https://storage.googleapis.com/webassembly/wasm-binaries-aarch64.tar.xz but I have no way to test it. Can y'all tell me if the binaries work? It should be possible to just drop them in as a replacement for the emsdk binaries and libs from another platform.

@bvibber
Copy link
Contributor Author

bvibber commented Mar 28, 2024

I've uploaded an experimental aarch64-linux build using Chromium's sysroot at https://storage.googleapis.com/webassembly/wasm-binaries-aarch64.tar.xz but I have no way to test it. Can y'all tell me if the binaries work? It should be possible to just drop them in as a replacement for the emsdk binaries and libs from another platform.

Unfortunately they do not; they are built for x86-64:

wasm-as: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped

Note you should be able to load up an emulated aarch64 Debian VM using Docker like this:

docker pull --platform linux/aarch64 debian:bookworm
docker run --platform linux/aarch64 --name debian-bookworm -h bookworm -e LANG=C.UTF-8 -it debian:bookworm /bin/bash -l

On a native arm64 machine this will be fast, on an x86-64 machine it should run the binaries via QEMU emulation. (It might still try to run x86-64 binaires if you give it to them but they'd fail without a glibc to link to, probably)

@sbc100
Copy link
Collaborator

sbc100 commented Mar 28, 2024

BTW I think it should be possible to test there binaries using docker + qemu on your linux x84_64 machine. (See WebAssembly/wabt#2380 for an example how to do this).

@dschuff
Copy link
Member

dschuff commented Mar 28, 2024

Oops, I think I uploaded the wrong archive. I think I fixed, you can try again.

@bvibber
Copy link
Contributor Author

bvibber commented Mar 28, 2024

Oops, I think I uploaded the wrong archive. I think I fixed, you can try again.

Good news is the binaries run, bad news is emscripten seems to want a much newer version of clang:

#include <stdio.h>

int main(int argc, const char **argv) {
	printf("Hello, world\n");
	return 0;
}
$ emcc -o test.wasm test.c
emcc: warning: LLVM version for clang executable "/home/brooke/src/emsdk/upstream/bin/clang" appears incorrect (seeing "13.0", expected "18") [-Wversion-check]
cache:INFO: generating system asset: symbol_lists/d84e367b9259bd8a3e5b31a38b0b4bd18fea7ae4.json... (this will be cached in "/home/brooke/src/emsdk/upstream/emscripten/cache/symbol_lists/d84e367b9259bd8a3e5b31a38b0b4bd18fea7ae4.json" for subsequent builds)
cache:INFO:  - ok
wasm-ld: error: unknown argument: --table-base=1
wasm-ld: error: unknown file type: /tmp/tmpz4o8dd9blibemscripten_js_symbols.so
emcc: error: '/home/brooke/src/emsdk/upstream/bin/wasm-ld -o test.wasm /tmp/emscripten_temp_u1q1h_0q/test_0.o -L/home/brooke/src/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten /home/brooke/src/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/crt1.o -lGL -lal -lhtml5 -lstandalonewasm-nocatch -lstubs-debug -lc-debug -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-debug-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpz4o8dd9blibemscripten_js_symbols.so --strip-debug --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export=emscripten_stack_get_end --export=emscripten_stack_get_free --export=emscripten_stack_get_base --export=emscripten_stack_get_current --export=emscripten_stack_init --export=stackSave --export=stackRestore --export=stackAlloc --export=__errno_location --export-table -z stack-size=65536 --initial-memory=16777216 --max-memory=16777216 --stack-first --table-base=1' failed (returned 1)

@dschuff
Copy link
Member

dschuff commented Mar 28, 2024

Sorry, another misconfiguration on my local machine... Should be good now.

@bvibber
Copy link
Contributor Author

bvibber commented Mar 29, 2024

[updated: worked around this, it was bad matching version of emscripten]

We're getting warmer! :D Now llvm/clang and binaryen are too new ;)

emcc: warning: LLVM version for clang executable "/home/brooke/src/emsdk/upstream/bin/clang" appears incorrect (seeing "19.0", expected "18") [-Wversion-check]

emcc: warning: unexpected binaryen version: 117 (expected 115) [-Wversion-check]

@bvibber
Copy link
Contributor Author

bvibber commented Mar 29, 2024

[update: yep, that was my fault lol]

Hmm, that may be an artifact of the previous emsdk linux/arm version being installed [for the emscripten components]. But I don't know how to get another version except by installing from source or going over to an x86 machine.

@bvibber
Copy link
Contributor Author

bvibber commented Mar 29, 2024

WE HAVE LIFTOFF :D

Ok I manually copied over the latest release emscripten from an x86_64 VM, plus the latest version of the aarch64 wasm tools build, and it works :D

brooke@bookworm:~/src/test$ uname -a
Linux bookworm 6.6.12-linuxkit #1 SMP Thu Feb  8 06:36:34 UTC 2024 aarch64 GNU/Linux
brooke@bookworm:~/src/test$ emcc --version
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.56 (cf90417346b78455089e64eb909d71d091ecc055)
Copyright (C) 2014 the Emscripten authors (see AUTHORS.txt)
This is free and open source software under the MIT license.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

brooke@bookworm:~/src/test$ emcc -o test.js test.c && node ./test.js
Hello, world

@sbc100
Copy link
Collaborator

sbc100 commented Mar 29, 2024

This is awesome news! Great works @dschuff

It would be good to some very minimal testing the of the arm64 binaryen on the x86_64 build host. If that is not possible we can also take are of that on the emsdk and emscripten builders which have access that arm64 circleci bots.

@dschuff
Copy link
Member

dschuff commented Mar 29, 2024

If this is something it would be easy to do in a github action on the emsdk side, that's probably going to be easier in the short run that setting it up in Chromium. I'm not sure Chromium's infrastructure has a convenient emulator install.

@otterley
Copy link

otterley commented Mar 29, 2024 via email

@sbc100
Copy link
Collaborator

sbc100 commented Mar 29, 2024

We already have linux arm64 testing as part of emsdk (via circleci):

test-linux-arm64:
executor: linux_arm64
steps:
- checkout
- run:
name: Install debian packages
command: sudo apt-get update -q && sudo apt-get install -q cmake build-essential openjdk-8-jre-headless
- run: test/test.sh

@plasticalligator
Copy link

image

@sbc100
Copy link
Collaborator

sbc100 commented Apr 12, 2024

This is now fixed and we build linux/arm64 binaries continuously on the main waterfall now: https://ci.chromium.org/p/emscripten-releases/g/main/console

@sbc100 sbc100 closed this as completed Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests