-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Github] Use building LLVM as perf-training for CI container #80713
[Github] Use building LLVM as perf-training for CI container #80713
Conversation
This patch adjusts the build process for building the toolchain for the CI container to perform more rigorous perf-training for PGO, particularly building the entirety of LLVM as that is what showed the best results while benchmarking. This patch also splits the job into two stages to avoid timeouts due to the large increase in buildtime. There are a couple other hacks added in here to make things work that we can do away with eventually once we're able to run jobs like this on more powerful self-hosted runners.
@llvm/pr-subscribers-github-workflow @llvm/pr-subscribers-clang Author: Aiden Grossman (boomanaiden154) ChangesThis patch adjusts the build process for building the toolchain for the CI container to perform more rigorous perf-training for PGO, particularly building the entirety of LLVM as that is what showed the best results while benchmarking. This patch also splits the job into two stages to avoid timeouts due to the large increase in buildtime. There are a couple other hacks added in here to make things work that we can do away with eventually once we're able to run jobs like this on more powerful self-hosted runners. Full diff: https://github.com/llvm/llvm-project/pull/80713.diff 6 Files Affected:
diff --git a/.github/workflows/build-ci-container.yml b/.github/workflows/build-ci-container.yml
index ad3d50d4d578a..3f2bf57eb8508 100644
--- a/.github/workflows/build-ci-container.yml
+++ b/.github/workflows/build-ci-container.yml
@@ -1,4 +1,3 @@
-
name: Build CI Container
permissions:
@@ -19,9 +18,41 @@ on:
- '.github/workflows/containers/github-action-ci/**'
jobs:
- build-ci-container:
+ # TODO(boomanaiden154): Switch this back to a single stage build when we can
+ # run this on the self-hosted runners and don't have to do it this way to
+ # avoid timeouts.
+ build-ci-container-stage1:
if: github.repository_owner == 'llvm'
runs-on: ubuntu-latest
+ steps:
+ - name: Checkout LLVM
+ uses: actions/checkout@v4
+ with:
+ sparse-checkout: .github/workflows/containers/github-action-ci/
+ - name: Change podman Root Direcotry
+ run: |
+ mkdir -p ~/.config/containers
+ sudo mkdir -p /mnt/podman
+ sudo chown `whoami`:`whoami` /mnt/podman
+ cp ./.github/workflows/containers/github-action-ci/storage.conf ~/.config/containers/storage.conf
+ podman info
+ - name: Build container stage1
+ working-directory: ./.github/workflows/containers/github-action-ci/
+ run: |
+ podman build -t stage1-toolchain --target stage1-toolchain -f stage1.Dockerfile .
+ - name: Save container image
+ run: |
+ podman save stage1-toolchain > stage1-toolchain.tar
+ - name: Upload container image
+ uses: actions/upload-artifact@v4
+ with:
+ name: stage1-toolchain
+ path: stage1-toolchain.tar
+ retention-days: 1
+ build-ci-container-stage2:
+ if: github.repository_owner == 'llvm'
+ runs-on: ubuntu-latest
+ needs: build-ci-container-stage1
permissions:
packages: write
steps:
@@ -38,10 +69,27 @@ jobs:
with:
sparse-checkout: .github/workflows/containers/github-action-ci/
+ - name: Change podman Root Direcotry
+ run: |
+ mkdir -p ~/.config/containers
+ sudo mkdir -p /mnt/podman
+ sudo chown `whoami`:`whoami` /mnt/podman
+ cp ./.github/workflows/containers/github-action-ci/storage.conf ~/.config/containers/storage.conf
+ podman info
+
+ - name: Download stage1-toolchain
+ uses: actions/download-artifact@v4
+ with:
+ name: stage1-toolchain
+
+ - name: Load stage1-toolchain
+ run: |
+ podman load -i stage1-toolchain.tar
+
- name: Build Container
working-directory: ./.github/workflows/containers/github-action-ci/
run: |
- podman build -t ${{ steps.vars.outputs.container-name-tag }} .
+ podman build -t ${{ steps.vars.outputs.container-name-tag }} -f stage2.Dockerfile .
podman tag ${{ steps.vars.outputs.container-name-tag }} ${{ steps.vars.outputs.container-name }}:latest
- name: Test Container
diff --git a/.github/workflows/containers/github-action-ci/Dockerfile b/.github/workflows/containers/github-action-ci/Dockerfile
deleted file mode 100644
index 66fa81d5a10ae..0000000000000
--- a/.github/workflows/containers/github-action-ci/Dockerfile
+++ /dev/null
@@ -1,55 +0,0 @@
-FROM docker.io/library/ubuntu:22.04 as base
-ENV LLVM_SYSROOT=/opt/llvm
-
-FROM base as toolchain
-ENV LLVM_VERSION=17.0.6
-
-RUN apt-get update && \
- apt-get install -y \
- wget \
- gcc \
- g++ \
- cmake \
- ninja-build \
- python3 \
- git
-
-RUN wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-$LLVM_VERSION.tar.gz && tar -xf llvmorg-$LLVM_VERSION.tar.gz
-
-WORKDIR /llvm-project-llvmorg-$LLVM_VERSION
-
-RUN mkdir build
-
-RUN cmake -B ./build -G Ninja ./llvm \
- -C ./clang/cmake/caches/BOLT-PGO.cmake \
- -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
- -DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
- -DPGO_INSTRUMENT_LTO=Thin \
- -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
- -DCMAKE_INSTALL_PREFIX="$LLVM_SYSROOT" \
- -DLLVM_ENABLE_PROJECTS="bolt;clang;lld;clang-tools-extra" \
- -DLLVM_DISTRIBUTION_COMPONENTS="lld;compiler-rt;clang-format" \
- -DCLANG_DEFAULT_LINKER="lld"
-
-RUN ninja -C ./build stage2-clang-bolt stage2-install-distribution && ninja -C ./build install-distribution && rm -rf ./build
-
-FROM base
-
-COPY --from=toolchain $LLVM_SYSROOT $LLVM_SYSROOT
-
-# Need to install curl for hendrikmuhs/ccache-action
-# Need nodejs for some of the GitHub actions.
-# Need perl-modules for clang analyzer tests.
-RUN apt-get update && \
- apt-get install -y \
- binutils \
- cmake \
- curl \
- libstdc++-11-dev \
- ninja-build \
- nodejs \
- perl-modules \
- python3-psutil
-
-ENV LLVM_SYSROOT=$LLVM_SYSROOT
-ENV PATH=${LLVM_SYSROOT}/bin:${PATH}
diff --git a/.github/workflows/containers/github-action-ci/bootstrap.patch b/.github/workflows/containers/github-action-ci/bootstrap.patch
new file mode 100644
index 0000000000000..55631c54a396f
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/bootstrap.patch
@@ -0,0 +1,13 @@
+diff --git a/clang/cmake/caches/BOLT-PGO.cmake b/clang/cmake/caches/BOLT-PGO.cmake
+index 1a04ca9a74e5..d092820e4115 100644
+--- a/clang/cmake/caches/BOLT-PGO.cmake
++++ b/clang/cmake/caches/BOLT-PGO.cmake
+@@ -4,6 +4,8 @@ set(CLANG_BOOTSTRAP_TARGETS
+ stage2-clang-bolt
+ stage2-distribution
+ stage2-install-distribution
++ clang
++ lld
+ CACHE STRING "")
+ set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS
+ clang-bolt
diff --git a/.github/workflows/containers/github-action-ci/stage1.Dockerfile b/.github/workflows/containers/github-action-ci/stage1.Dockerfile
new file mode 100644
index 0000000000000..fbc4548e6636e
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/stage1.Dockerfile
@@ -0,0 +1,44 @@
+FROM docker.io/library/ubuntu:22.04 as base
+ENV LLVM_SYSROOT=/opt/llvm
+
+FROM base as stage1-toolchain
+ENV LLVM_VERSION=17.0.6
+
+RUN apt-get update && \
+ apt-get install -y \
+ wget \
+ gcc \
+ g++ \
+ cmake \
+ ninja-build \
+ python3 \
+ git \
+ curl
+
+RUN curl -O -L https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-$LLVM_VERSION.tar.gz && tar -xf llvmorg-$LLVM_VERSION.tar.gz
+
+WORKDIR /llvm-project-llvmorg-$LLVM_VERSION
+
+COPY bootstrap.patch /
+
+# TODO(boomanaiden154): Remove the patch pulled from a LLVM PR once we bump
+# the toolchain to version 18 and the patch is in-tree.
+# TODO(boomanaiden154): Remove the bootstrap patch once we unsplit the build
+# and no longer need to explicitly build the stage2 dependencies.
+RUN curl https://github.com/llvm/llvm-project/commit/dd0356d741aefa25ece973d6cc4b55dcb73b84b4.patch | patch -p1 && cat /bootstrap.patch | patch -p1
+
+RUN mkdir build
+
+RUN cmake -B ./build -G Ninja ./llvm \
+ -C ./clang/cmake/caches/BOLT-PGO.cmake \
+ -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
+ -DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
+ -DPGO_INSTRUMENT_LTO=Thin \
+ -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
+ -DCMAKE_INSTALL_PREFIX="$LLVM_SYSROOT" \
+ -DLLVM_ENABLE_PROJECTS="bolt;clang;lld;clang-tools-extra" \
+ -DLLVM_DISTRIBUTION_COMPONENTS="lld;compiler-rt;clang-format" \
+ -DCLANG_DEFAULT_LINKER="lld" \
+ -DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/llvm-project-llvmorg-$LLVM_VERSION/llvm
+
+RUN ninja -C ./build stage2-instrumented-clang stage2-instrumented-lld
diff --git a/.github/workflows/containers/github-action-ci/stage2.Dockerfile b/.github/workflows/containers/github-action-ci/stage2.Dockerfile
new file mode 100644
index 0000000000000..e1a06cb68a589
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/stage2.Dockerfile
@@ -0,0 +1,27 @@
+FROM docker.io/library/ubuntu:22.04 as base
+ENV LLVM_SYSROOT=/opt/llvm
+
+FROM stage1-toolchain AS stage2-toolchain
+
+RUN ninja -C ./build stage2-clang-bolt stage2-install-distribution && ninja -C ./build install-distribution && rm -rf ./build
+
+FROM base
+
+COPY --from=stage2-toolchain $LLVM_SYSROOT $LLVM_SYSROOT
+
+# Need to install curl for hendrikmuhs/ccache-action
+# Need nodejs for some of the GitHub actions.
+# Need perl-modules for clang analyzer tests.
+RUN apt-get update && \
+ apt-get install -y \
+ binutils \
+ cmake \
+ curl \
+ libstdc++-11-dev \
+ ninja-build \
+ nodejs \
+ perl-modules \
+ python3-psutil
+
+ENV LLVM_SYSROOT=$LLVM_SYSROOT
+ENV PATH=${LLVM_SYSROOT}/bin:${PATH}
diff --git a/.github/workflows/containers/github-action-ci/storage.conf b/.github/workflows/containers/github-action-ci/storage.conf
new file mode 100644
index 0000000000000..60f295ff1e969
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/storage.conf
@@ -0,0 +1,4 @@
+[storage]
+ driver = "overlay"
+ runroot = "/mnt/podman/container"
+ graphroot = "/mnt/podman/image"
|
Looks like there was a transient failure in the stage2 build while loading the container image. I've restarted it and everything seems to be going fine. Hopefully that's not an issue going forward. |
This patch adjusts the build process for building the toolchain for the CI container to perform more rigorous perf-training for PGO, particularly building the entirety of LLVM as that is what showed the best results while benchmarking. This patch also splits the job into two stages to avoid timeouts due to the large increase in buildtime. There are a couple other hacks added in here to make things work that we can do away with eventually once we're able to run jobs like this on more powerful self-hosted runners.