Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildbox: Add new cross-compiling buildbox for Teleport #44130

Merged
merged 16 commits into from
Jul 23, 2024
Merged

Conversation

camscale
Copy link
Contributor

Add Dockerfiles and make targets to build a new buildbox that has a set
of cross compilers for the four architectures we target for Linux -
amd64, arm64, 386, and arm (32-bit). The buildbox also contains the
third-party C libraries that Teleport needs to statically link against
for a full set of features, again compiled for each target architecture.

The cross compilers are built using crosstool-NG. The compilers must
have a "vendor" field in the "target triple" of "unknown" (default).
Setting it to anything else means rust does not find the cross compilers
properly. Using "unknown" causes the triple to match the rust target.
(This may ultimately be unnecessary to make these match, but is needed
if we have the boring crate build boringssl itself - see last
paragraph).

Two container images are produced:

  • buildbox-thirdparty - a base image that contains the third-party
    tools (compilers, etc) and libraries that we build from source. Once
    build, this should never need to be rebuilt unless we change the
    version of one of the components it builds.
  • buildbox-ng - the buildbox used to build Teleport. It copies out the
    third-party components from the previous image and installs whatever
    other tools are needed by the build, whether from the distro archive
    or directly from upstream (such as Go and Rust compilers).

Additionally, the three intermediate build stages of
buildbox-thirdparty can be built for working on the buildbox; ctng,
compilers and tplibs.

buildbox-ng will become just buildbox once it can replace the others
entirely, at which point those others will be removed.

Currently the build tools to build a FIPS version of teleport is not in
the buildbox. That has been troublesome as the rust boring crate wants
to build boringssl itself and it is not being done correctly with the
cross compilers. Further work is needed here, likely building boringssl
ourselves and pointing the rust boring crate at the pre-built libraries.


This is still under development, but I want to get it out there for
visibility. Subsequent PRs will update the build.assets/Makefile to
build the teleport binaries, but there are some changes that need to be
made to the main Makefiles first so that both old and new buildboxes can
co-exist as well as being able to do native local builds from a teleport
checkout.

With this, you can run:

make -C build.assets buildbox-thirdparty
make -C build.assets buildbox-ng

and you'll get the final buildbox-ng image.

After doing that, you can also run

make -C build.assets buildbox-thirdparty-ctng
make -C build.assets buildbox-thirdparty-compilers
make -C build.assets buildbox-thirdparty-tplibs

These will all come from the build cache and take almost no time at all.
These give you the intermediate stages of the buildbox-thirdparty
image so that you can reconfigure crosstool-NG for the different
architectures (see the config/buildbox-ng target in Makefile), and
enter at the different stages to run the compilers to build libraries.

@camscale camscale added the no-changelog Indicates that a PR does not require a changelog entry label Jul 12, 2024
@camscale camscale requested review from r0mant and jakule July 12, 2024 13:41
@github-actions github-actions bot requested review from avatus and flyinghermit July 12, 2024 13:41
Copy link
Contributor

@jakule jakule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, but overall LGTM
@camscale Do you have the build time of Teleport with old/new toolchain? I'm curious how does it compare.

build.assets/Makefile Outdated Show resolved Hide resolved
build.assets/Makefile Outdated Show resolved Hide resolved
build.assets/Makefile Outdated Show resolved Hide resolved
build.assets/buildbox/bbcommon.mk Outdated Show resolved Hide resolved
build.assets/buildbox/tplibs.mk Outdated Show resolved Hide resolved
build.assets/buildbox/tplibs.mk Outdated Show resolved Hide resolved
build.assets/buildbox/ctng.mk Outdated Show resolved Hide resolved
build.assets/buildbox/tplibs.mk Outdated Show resolved Hide resolved
build.assets/buildbox/tplibs.mk Outdated Show resolved Hide resolved
@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from flyinghermit July 12, 2024 19:08
@camscale
Copy link
Contributor Author

@jakule > Do you have the build time of Teleport with old/new toolchain?

I haven't done any timings yet. I am still having some trouble building enterprise teleport on arm64, which is the only platform I can reasonably compare at the moment with the rdp rust stuff (errors like relocation truncated to fit: R_AARCH64_CALL26 against symbol __aarch64_ldeor4_relax'`) when linking the final binary. I haven't noticed it being faster or slower in particular though.

@r0mant @jakule I've addressed all the comments. PTAL

Copy link
Contributor

@jakule jakule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run the build locally but it failed in a few places.

build.assets/Makefile Outdated Show resolved Hide resolved
build.assets/buildbox/Dockerfile-thirdparty Outdated Show resolved Hide resolved
build.assets/Makefile Outdated Show resolved Hide resolved
.PHONY: buildbox-ng
buildbox-ng:
docker buildx build \
--build-arg THIRDPARTY_IMAGE=$(BUILDBOX_THIRDPARTY) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky as it will always pick-up remote images instead of the local one. I had to modify this line to run it as the remove image doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not had that problem, but I guess it has something to do with the builder? I think I might be using the original docker driver not the containerd one, but I'm not really sure what I'm seeing with docker once I had to switch to the Mac. I'm using colima, and spent a bit of time stuffing around with the runtime, snapshotter, etc but nothing made sense to me in the end.

I'm really not sure how I'm meant to set this stuff up if it will not look in the local image cache as it does for me but not you. Also in a way that works for devs on Macs and on Linux on CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure how I'm meant to set this stuff up

I'd just rename to local build to match the expected name here. In this case when you build locally the local container will be picked up instead of the remote one.

Copy link
Contributor Author

@camscale camscale Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make that work. This is a "common" issue with the docker-container buildkit driver - it does not have access to the docker image cache except to store built images there when run with --load. I have been using the docker driver so this has not been an issue, but I ran the following and at the moment I cannot make this work with the docker-container driver:

$ docker buildx create foo --driver docker-container --use
$ docker images
ghcr.io/gravitational/teleport-buildbox-thirdparty   teleport17         bd51cf80045f   21 hours ago   2.08GB

# ... at this point I have thirdparty image. I give it a "local" name as you suggest:
$ docker tag ghcr.io/gravitational/teleport-buildbox-thirdparty:teleport17 teleport-buildbox-thirdparty:teleport17

# Now I try to build buildbox-ng with this image:
$ make -C build.assets buildbox-ng BUILDBOX_THIRDPARTY=teleport-buildbox-thirdparty:teleport17
...
Dockerfile:26
--------------------
  24 |     # Reference the thirdparty image for copying from later.
  25 |
  26 | >>> FROM ${THIRDPARTY_IMAGE} AS thirdparty
  27 |
  28 |     # ----------------------------------------------------------------------------
--------------------
ERROR: failed to solve: teleport-buildbox-thirdparty:teleport17: failed to resolve source metadata for docker.io/library/teleport-buildbox-thirdparty:teleport17: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed

As you can see, it tries to resolve the "local" image name relative to docker.io/library.

I can't see any way to make this work using the docker-container driver. This is a long-standing issue (I dont want to link it here because I don't want that issue back-linking to this PR: moby buildkit issues/2343) with no real resolution right now except to use the docker driver instead. But there is no standard name for the builder/context that uses that driver so it is not as simple as just telling docker buildx to use the docker driver.

Can you try building this using a build driver that uses the docker driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented with the containerd-snapshotter image cache, which can store multi-arch images, but it still doesn't work - it does not look in the local image cache at all when you are using the docker-containerd driver. @jakule If you did manage to get this working somehow, I'd really like to know what you did.


# Build cross-compiling toolchains with ct-ng
RUN --mount=type=cache,id=download,uid=${BUILDBOX_UID},target=${THIRDPARTY_DIR}/download \
make -f crosstoolng.mk crosstoolng ARCH=amd64 && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails for me locally as:

  • crosstoolng target doesn't exist
  • when I run crosstoolng-menuconfig (This is the correct one, kind of? technically we should run make config to create the config without any GUI) I'm getting permission denied.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed up the renaming I broke and pushed. This is what works for me from a clean slate:

$ cd build.assets
$ make buildbox-thirdparty
... wait an hour ...
$ make buildbox-ng

At this point you can easily create the intermediate containers, the first of which is needed to reconfigure ctng:

$ make buildbox-thirdparty-crosstoolng
$ make buildbox-thirdparty-compilers
$ make buildbox-thirdparty-libs

The steps for all of this should be cached from the original make buildbox-thirdparty

WIth the first of these dev images you should be able to run:

$ make config/buildbox-ng CROSSTOOLNG_ARCH=arm64

That will copy the crosstoolng-configs/arm64.defconfig to the build area, run make defconfig to expand that to a full config, run make menuconfig to reconfigure interactively. When you save and exit, it will then run make savedefconfig to "compress" the config back to a defconfig and then copy it back to crosstoolng-configs/. That last step might be what is failing because the buildbox container should be running as the buildbox user (uid=99), so may not have permission to write back. But it worked for me unexpectedly and I didn't bother yet to dig into why.... So I just looked and it seems that the teleport repo volume mapped onto gets owned by the container uid/gid, so that's why it can copy back. Not sure if there's some Mac magic going on there.

You can use the buildbox with:

$ make enter/buildbox-ng
ctr$ eval $(make -s -C build.assets/buildbox -f cross-compile.mk ARCH=arm64 sh-cross-vars)

You can set ARCH in that last line to whatever you want to build. Run it without the eval to see what the vars are that get set and their values.

I haven't added the node stuff to the buildbox yet, so it cannot build the web ui, but in another window in the same teleport repo, you can run make ensure-webassets and make ensure-webassets-e to build them and then inside the container you can run make build/teleport ARCH=arm64 to build a binary.

camscale added 13 commits July 18, 2024 22:48
Add Dockerfiles and make targets to build a new buildbox that has a set
of cross compilers for the four architectures we target for Linux -
amd64, arm64, 386, and arm (32-bit). The buildbox also contains the
third-party C libraries that Teleport needs to statically link against
for a full set of features, again compiled for each target architecture.

The cross compilers are built using crosstool-NG. The compilers must
have a "vendor" field in the "target triple" of "unknown" (default).
Setting it to anything else means rust does not find the cross compilers
properly. Using "unknown" causes the triple to match the rust target.
(This may ultimately be unnecessary to make these match, but is needed
if we have the boring crate build boringssl itself - see last
paragraph).

Two container images are produced:
* `buildbox-thirdparty` - a base image that contains the third-party
  tools (compilers, etc) and libraries that we build from source. Once
  build, this should never need to be rebuilt unless we change the
  version of one of the components it builds.
* `buildbox-ng` - the buildbox used to build Teleport. It copies out the
  third-party components from the previous image and installs whatever
  other tools are needed by the build, whether from the distro archive
  or directly from upstream (such as Go and Rust compilers).

Additionally, the three intermediate build stages of
`buildbox-thirdparty` can be built for working on the buildbox; `ctng`,
`compilers` and `tplibs`.

`buildbox-ng` will become just `buildbox` once it can replace the others
entirely, at which point those others will be removed.

Currently the build tools to build a FIPS version of teleport is not in
the buildbox. That has been troublesome as the rust boring crate wants
to build boringssl itself and it is not being done correctly with the
cross compilers. Further work is needed here, likely building boringssl
ourselves and pointing the rust boring crate at the pre-built libraries.
Add a libelf.pc file taken from elfutils and minorly adjusted for how we
install it separately. libelf was taken out of elfutils but did not take
the libelf.pc file, so we do that ourselves.

This will allow the build to use pkg-config/pkconf for selecting the
libraries to link teleport to as some later versions of libelf also need
`-lzstd`, such as this buildbox but also ubuntu-24.04.
Set `PREFIX` when building `zstd` to ensure it's pkg-config file has the
correct prefix set in it, otherwise it has `-L/usr/local/lib` which we
do not want for cross-compiling.

Add a `sh-cross-vars` target to echo the cross-compiling vars for an
architecture, in a form that can be sourced by the shell:

    eval $(make -s -f cross-compile.mk ARCH=arm64 sh-cross-vars)

This will set up your shell for cross-compiling for the given
architecture, which is useful when working on the libraries in the
buildbox.
* Rename ctng to crosstoolng throughout
* Rename tplibs.mk to thirdparty-libs.mk
* Fix spelling mistakes.
* Remove `--hostname $(HOSTNAME)` when running containers.
* Remove `--volume /tmp:/tmp` when running containers.
* Rename bbcommon.mk to buildbox-common.mk
Renaming things was done poorly in the last few commits. Fix all that
up.
Use the `gold` linker when building arm64 binaries, as Enterprise
Teleport will not build with the binutils (bfd) linker; it gives
numerous errors when linking of the form:

    something.rs: (.text.unlikely._XXX): relocation truncated to fit: R_AARCH64_CALL26 against symbol ...

These errors do not occur when using the `gold` linker, which is already
included and built by crosstool-NG.
Tweak the crosstool-NG configuration for the arm cross-compiler so that
the tuple for it matches the tuple currently used in the build by rust
for the same target. This is mostly to cause less confusion as there's
no real need for them to be different.
Add environment variables so that cargo can find the appropriate
architecture-specific linker when cross-compiling. This is needed now as
we have a rust binary (fdpass-teleport) in the build, as opposed to just
rust compiled to a library linked to Go code. Rust does not have a cross
platform linker and relies an a linker in the toolchain for the target
architecture.

To make this work without needing per-architecture setup when building,
all of the toolchain binaries are symlinked into
/opt/thirdparty/host/bin so they are all in the path. Because each of
the binaries is prefixed with the target tuple, there are no name
clashes.
Download Go and Rust in separate stages and copy their installation into
the final container image. This helps as these stages no longer have
unrelated layers before them so they cache better. The Go and Rust
stages should only need to be rebuilt if the versions of the compilers
change, or some other aspect of the installation of the compilers
change. Previously adding a new package to be installed would cause Go
and Rust to be re-installed. This no longer happens.
Remove the include of `buildbox-common.mk` from `thirdparty-libs.mk` as
it is included by `cross-compile.mk`. This caused some duplicate target
warnings when make ran.
Add supoprt for building in buildbox-ng to the build by including the
cross-compiling definitions if we are running in that buildbox and
gating some older cross-compiling definitions from being used.
Add a set of targets to build releases using buildbox-ng instead of the
standard buildboxes. Ultimately these targets will be removed when the
old buildboxes are removed. These new targets are temporary until that
happens supporting testing of the new buildbox.
@camscale camscale force-pushed the camh/buildbox-ng branch 2 times, most recently from 2627a0d to 93dea60 Compare July 18, 2024 13:40
@camscale
Copy link
Contributor Author

This is now working end-to-end for the releases of amd64, arm64, 386 and arm for OSS and enterprise. FIPS is still not supported though. The last thing I had to figure out was rust cross-compiling binaries - the rust compiler needs support of an external linker to cross-compile binaries and this is needed for the new fdpass-teleport binary. See ba9a3bc for details.

I've also added some make targets to build.assets/Makefile to do release builds.

With this PR, you can build the buildbox and build the releases:

$ cd build.assets
$ make buildbox-thirdparty
$ make buildbox-ng
$ make release-ng-amd64
$ ls -l ../build/artifacts  # <-- will show oss and enterprise tar balls
$ make release-ng-arm64 # save the artifacts as each build does a clean
$ make release-ng-386
$ make release-ng-arm

It is possible that the second build (make buildbox-ng) will not work if you are using the containerd docker buildkit driver as it does not look in the local image cache for the image built with make buildbox-thirdparty. I am using the default docker driver and this works. I need to figure out what to do if you are using the containerd driver - perhaps it will all be fine once the images are pushed to ghcr.io, but it would be nice to not need that to help with local development.

camscale added 2 commits July 23, 2024 00:31
Add the teleport user (1000:1000) to the buildbox for running in CI
where a repository is checked-out inside the container instead of
mounting a volume on to /home/teleport. Make that home directory
world-writable so that it can still be used if 1000:1000 cannot be used
for some reason.

Expand and clean up comments.

Re-order ENV vars a little for better grouping.

Put Rust and Go temp directories under /tmp/build so a single volume can
be mounted there for persistent caches across runs.
Explicitly set `PKG_CONFIG_PATH` when running `$(PKGCONF)` as when
running that in a `$(shell ...)` expression, variables exported in the
Makefile are not exported for that shell, so `$(PKGCONF)` does not see
`PKG_CONFIG_PATH`. GNU make 4.4 changes this so that exported variables
are exported to `$(shell ...)` expressions, but the buildbox has GNU
make 4.3.

This was causing libbpf to not be detected properly so was building
teleport without bpf.
@camscale camscale added this pull request to the merge queue Jul 23, 2024
Merged via the queue into master with commit 5f7e1f6 Jul 23, 2024
39 checks passed
@camscale camscale deleted the camh/buildbox-ng branch July 23, 2024 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/lg
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants