Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplatform build slows drastically after the first platform #982

Closed
3 tasks done
K20shores opened this issue Oct 6, 2023 · 7 comments
Closed
3 tasks done

Multiplatform build slows drastically after the first platform #982

K20shores opened this issue Oct 6, 2023 · 7 comments

Comments

@K20shores
Copy link

Contributing guidelines

I've found a bug, and:

  • The documentation does not mention anything about my problem
  • There are no open or closed issues that are related to my problem

Description

Creating a multiplatform build results in a build time that is very long or one that doesn't finish. I have two examples

In the micm project

  1. Building for one platform succeeds in 7 minutes
  2. Building for two platforms takes over an hour, so I canceled it
  3. Here's another I won't cancel. I expect it to time out

In another project when I tried this a few months ago, the build timed out after six hours for multiple platforms.

Expected behaviour

The build doesn't time out for more than one platform.

Actual behaviour

The build does time out for more than one platform.

Repository URL

https://github.com/NCAR/micm

Workflow run URL

https://github.com/NCAR/micm/actions/runs/6433827511

YAML workflow

name: Create and publish a Docker image

on:
  push:
    branches: ['release', '292-add-a-docker-image-publish']
    tags:
      - '*'

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-push-image:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
        with:
          submodules: recursive

      - name: Login to Container Registry
        uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          file: docker/Dockerfile.publish
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

Workflow logs

No response

BuildKit logs

No response

Additional info

No response

@crazy-max
Copy link
Member

crazy-max commented Oct 9, 2023

Same as #977 (comment) but looking at your Dockerfile: https://github.com/NCAR/micm/blob/292-add-a-docker-image-publish/docker/Dockerfile.publish

FROM fedora:37

RUN dnf -y update \
    && dnf -y install \
        cmake \
        gcc-c++ \
        gdb \
        git \
        make \
        zlib-devel \
        llvm-devel \
    && dnf clean all

# copy the MICM code
COPY . /micm/

# build the library and run the tests
RUN mkdir /build \
      && cd /build \
      && cmake \
        -D CMAKE_BUILD_TYPE=release \
        -D ENABLE_LLVM:BOOL=TRUE \
        -D ENABLE_JSON:BOOL=TRUE \
        ../micm \
      && make install -j 8

WORKDIR /build

You might be able to use cross-compilation with https://github.com/tonistiigi/xx/.

See for example https://github.com/crazy-max/docker-msmtpd/blob/21e387c379fe37fdb0249aaa42f95bf3fbc824fc/Dockerfile#L15-L27 or https://github.com/crazy-max/docker-7zip/blob/8f719b2ce3074818119cc19f2b16de5177bf0ad3/Dockerfile#L16-L27 or https://github.com/crazy-max/docker-qbittorrent/blob/2c6eaead6eb3dad5256ed54a097bbf0a87d28c71/Dockerfile#L26-L36

@K20shores
Copy link
Author

@crazy-max thanks for this. I'll look into this in the near future.

@polarathene
Copy link

FWIW, you could leverage caching with RUN --mount=type=cache if the cmake build is time consuming. This needs an additional action to export/import the cache mount as it's separate from the build layer cache that docker/build-push-action manages (see this advice).

If your builds aren't running in Github Actions runners (eg: remote build), then you may also hit a problem that affects Docker / containerd releases with LimitNOFILE in docker.service and containerd.service systemd configs. I mention this especially because you're using DNF, and that is known to be excessively slow on affected environments.

Looking at your 7 min CI run, the dnf RUN was 1 minute and the cmake 4.5 minutes. The Github runner isn't affected by the limits issue and I don't think QEMU emulation affects that concern any differently.

You should expect platforms relying on QEMU in the CI to be quite slow (hence importance of leveraging caching, especially for cmake if you can't cross-compile).

@K20shores
Copy link
Author

@polarathene thanks! I'm giving that a try. The action

@K20shores
Copy link
Author

I suppose I did it wrong. It seems that no cache was used...

@linkdd
Copy link

linkdd commented Sep 23, 2024

@crazy-max Why did you close the issue? It's not solved.

Here is an example: https://github.com/link-society/flowg/actions/runs/10995303563/job/30525786239

The pipeline has been running for more than 2h. I know that compiling Rust is slow, but for a linux/amd64 build on Github Actions, it should take no more than 10min (which is still incredibly slow but oh well).

@polarathene
Copy link

@crazy-max Why did you close the issue? It's not solved.

It was solved, look at the last comments.


The pipeline has been running for more than 2h. I know that compiling Rust is slow, but for a linux/amd64 build on Github Actions, it should take no more than 10min (which is still incredibly slow but oh well).

The linked workflow that is taking long is for linux/arm64, not linux/amd64.

As stated QEMU emulating ARM64 for CI builds is very slow. It's not native builder node. Also notice how it's not just your rust crates compiling slow, it's Go and JS project builds too.

With Go and Rust you can at least use amd64 to cross-compile to arm64 and save much time. If you have any external deps that need compiling that aren't native Go/Rust, then you'll need to leverage Zig for that same benefit (probably helps with JS if any package attempts to compile when lacking a pre-compiled binary).

Once you do that, your builds should not be on non-native linux/arm64 buildx builder nodes. You'll need to transfer the result to the equivalent ARM image base though, that may be further complicated if you've got dynamic linking going on, in that case you'll potentially need patchelf to manually fix it up.

Beyond that, last I checked Github doesn't offer free arm64 native runners, you have to provide your own self-hosted runners on arm64 platforms to be used. Or you can try to leverage caching for future CI runs, but this is known to be tricky to do well within the Github CI and Docker images (notably with Rust at least). These are the only two alternatives that would avoid the cross-compiling amd64 to arm64 approach that I'm aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants