Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Github] Use building LLVM as perf-training for CI container #80713

Merged

Conversation

boomanaiden154
Copy link
Contributor

This patch adjusts the build process for building the toolchain for the CI container to perform more rigorous perf-training for PGO, particularly building the entirety of LLVM as that is what showed the best results while benchmarking. This patch also splits the job into two stages to avoid timeouts due to the large increase in buildtime. There are a couple other hacks added in here to make things work that we can do away with eventually once we're able to run jobs like this on more powerful self-hosted runners.

This patch adjusts the build process for building the toolchain for the
CI container to perform more rigorous perf-training for PGO,
particularly building the entirety of LLVM as that is what showed the
best results while benchmarking. This patch also splits the job into two
stages to avoid timeouts due to the large increase in buildtime. There
are a couple other hacks added in here to make things work that we can
do away with eventually once we're able to run jobs like this on more
powerful self-hosted runners.
@llvmbot llvmbot added clang Clang issues not falling into any other category github:workflow labels Feb 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Feb 5, 2024

@llvm/pr-subscribers-github-workflow

@llvm/pr-subscribers-clang

Author: Aiden Grossman (boomanaiden154)

Changes

This patch adjusts the build process for building the toolchain for the CI container to perform more rigorous perf-training for PGO, particularly building the entirety of LLVM as that is what showed the best results while benchmarking. This patch also splits the job into two stages to avoid timeouts due to the large increase in buildtime. There are a couple other hacks added in here to make things work that we can do away with eventually once we're able to run jobs like this on more powerful self-hosted runners.


Full diff: https://github.com/llvm/llvm-project/pull/80713.diff

6 Files Affected:

  • (modified) .github/workflows/build-ci-container.yml (+51-3)
  • (removed) .github/workflows/containers/github-action-ci/Dockerfile (-55)
  • (added) .github/workflows/containers/github-action-ci/bootstrap.patch (+13)
  • (added) .github/workflows/containers/github-action-ci/stage1.Dockerfile (+44)
  • (added) .github/workflows/containers/github-action-ci/stage2.Dockerfile (+27)
  • (added) .github/workflows/containers/github-action-ci/storage.conf (+4)
diff --git a/.github/workflows/build-ci-container.yml b/.github/workflows/build-ci-container.yml
index ad3d50d4d578a..3f2bf57eb8508 100644
--- a/.github/workflows/build-ci-container.yml
+++ b/.github/workflows/build-ci-container.yml
@@ -1,4 +1,3 @@
-
 name: Build CI Container
 
 permissions:
@@ -19,9 +18,41 @@ on:
       - '.github/workflows/containers/github-action-ci/**'
 
 jobs:
-  build-ci-container:
+  # TODO(boomanaiden154): Switch this back to a single stage build when we can
+  # run this on the self-hosted runners and don't have to do it this way to
+  # avoid timeouts.
+  build-ci-container-stage1:
     if: github.repository_owner == 'llvm'
     runs-on: ubuntu-latest
+    steps:
+      - name: Checkout LLVM
+        uses: actions/checkout@v4
+        with:
+          sparse-checkout: .github/workflows/containers/github-action-ci/
+      - name: Change podman Root Direcotry
+        run: |
+          mkdir -p ~/.config/containers
+          sudo mkdir -p /mnt/podman
+          sudo chown `whoami`:`whoami` /mnt/podman
+          cp ./.github/workflows/containers/github-action-ci/storage.conf ~/.config/containers/storage.conf
+          podman info
+      - name: Build container stage1
+        working-directory: ./.github/workflows/containers/github-action-ci/
+        run: |
+          podman build -t stage1-toolchain --target stage1-toolchain -f stage1.Dockerfile .
+      - name: Save container image
+        run: |
+          podman save stage1-toolchain > stage1-toolchain.tar
+      - name: Upload container image
+        uses: actions/upload-artifact@v4
+        with:
+          name: stage1-toolchain
+          path: stage1-toolchain.tar
+          retention-days: 1
+  build-ci-container-stage2:
+    if: github.repository_owner == 'llvm'
+    runs-on: ubuntu-latest
+    needs: build-ci-container-stage1
     permissions:
       packages: write
     steps:
@@ -38,10 +69,27 @@ jobs:
         with:
           sparse-checkout: .github/workflows/containers/github-action-ci/
 
+      - name: Change podman Root Direcotry
+        run: |
+          mkdir -p ~/.config/containers
+          sudo mkdir -p /mnt/podman
+          sudo chown `whoami`:`whoami` /mnt/podman
+          cp ./.github/workflows/containers/github-action-ci/storage.conf ~/.config/containers/storage.conf
+          podman info
+
+      - name: Download stage1-toolchain
+        uses: actions/download-artifact@v4
+        with:
+          name: stage1-toolchain
+
+      - name: Load stage1-toolchain
+        run: |
+          podman load -i stage1-toolchain.tar
+
       - name: Build Container
         working-directory: ./.github/workflows/containers/github-action-ci/
         run: |
-          podman build -t ${{ steps.vars.outputs.container-name-tag }} .
+          podman build -t ${{ steps.vars.outputs.container-name-tag }} -f stage2.Dockerfile .
           podman tag ${{ steps.vars.outputs.container-name-tag }} ${{ steps.vars.outputs.container-name }}:latest
 
       - name: Test Container
diff --git a/.github/workflows/containers/github-action-ci/Dockerfile b/.github/workflows/containers/github-action-ci/Dockerfile
deleted file mode 100644
index 66fa81d5a10ae..0000000000000
--- a/.github/workflows/containers/github-action-ci/Dockerfile
+++ /dev/null
@@ -1,55 +0,0 @@
-FROM docker.io/library/ubuntu:22.04 as base
-ENV LLVM_SYSROOT=/opt/llvm
-
-FROM base as toolchain
-ENV LLVM_VERSION=17.0.6
-
-RUN apt-get update && \
-    apt-get install -y \
-    wget \
-    gcc \
-    g++ \
-    cmake \
-    ninja-build \
-    python3 \
-    git
-
-RUN wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-$LLVM_VERSION.tar.gz && tar -xf llvmorg-$LLVM_VERSION.tar.gz
-
-WORKDIR /llvm-project-llvmorg-$LLVM_VERSION
-
-RUN mkdir build
-
-RUN cmake -B ./build -G Ninja ./llvm \
-  -C ./clang/cmake/caches/BOLT-PGO.cmake \
-  -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
-  -DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
-  -DPGO_INSTRUMENT_LTO=Thin \
-  -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
-  -DCMAKE_INSTALL_PREFIX="$LLVM_SYSROOT" \
-  -DLLVM_ENABLE_PROJECTS="bolt;clang;lld;clang-tools-extra" \
-  -DLLVM_DISTRIBUTION_COMPONENTS="lld;compiler-rt;clang-format" \
-  -DCLANG_DEFAULT_LINKER="lld"
-
-RUN ninja -C ./build stage2-clang-bolt stage2-install-distribution && ninja -C ./build install-distribution && rm -rf ./build
-
-FROM base
-
-COPY --from=toolchain $LLVM_SYSROOT $LLVM_SYSROOT
-
-# Need to install curl for hendrikmuhs/ccache-action
-# Need nodejs for some of the GitHub actions.
-# Need perl-modules for clang analyzer tests.
-RUN apt-get update && \
-    apt-get install -y \
-    binutils \
-    cmake \
-    curl \
-    libstdc++-11-dev \
-    ninja-build \
-    nodejs \
-    perl-modules \
-    python3-psutil
-
-ENV LLVM_SYSROOT=$LLVM_SYSROOT
-ENV PATH=${LLVM_SYSROOT}/bin:${PATH}
diff --git a/.github/workflows/containers/github-action-ci/bootstrap.patch b/.github/workflows/containers/github-action-ci/bootstrap.patch
new file mode 100644
index 0000000000000..55631c54a396f
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/bootstrap.patch
@@ -0,0 +1,13 @@
+diff --git a/clang/cmake/caches/BOLT-PGO.cmake b/clang/cmake/caches/BOLT-PGO.cmake
+index 1a04ca9a74e5..d092820e4115 100644
+--- a/clang/cmake/caches/BOLT-PGO.cmake
++++ b/clang/cmake/caches/BOLT-PGO.cmake
+@@ -4,6 +4,8 @@ set(CLANG_BOOTSTRAP_TARGETS
+   stage2-clang-bolt
+   stage2-distribution
+   stage2-install-distribution
++  clang
++  lld
+   CACHE STRING "")
+ set(BOOTSTRAP_CLANG_BOOTSTRAP_TARGETS
+   clang-bolt
diff --git a/.github/workflows/containers/github-action-ci/stage1.Dockerfile b/.github/workflows/containers/github-action-ci/stage1.Dockerfile
new file mode 100644
index 0000000000000..fbc4548e6636e
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/stage1.Dockerfile
@@ -0,0 +1,44 @@
+FROM docker.io/library/ubuntu:22.04 as base
+ENV LLVM_SYSROOT=/opt/llvm
+
+FROM base as stage1-toolchain
+ENV LLVM_VERSION=17.0.6
+
+RUN apt-get update && \
+    apt-get install -y \
+    wget \
+    gcc \
+    g++ \
+    cmake \
+    ninja-build \
+    python3 \
+    git \
+    curl
+
+RUN curl -O -L https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-$LLVM_VERSION.tar.gz && tar -xf llvmorg-$LLVM_VERSION.tar.gz
+
+WORKDIR /llvm-project-llvmorg-$LLVM_VERSION
+
+COPY bootstrap.patch /
+
+# TODO(boomanaiden154): Remove the patch pulled from a LLVM PR once we bump
+# the toolchain to version 18 and the patch is in-tree.
+# TODO(boomanaiden154): Remove the bootstrap patch once we unsplit the build
+# and no longer need to explicitly build the stage2 dependencies.
+RUN curl https://github.com/llvm/llvm-project/commit/dd0356d741aefa25ece973d6cc4b55dcb73b84b4.patch | patch -p1 && cat /bootstrap.patch | patch -p1
+
+RUN mkdir build
+
+RUN cmake -B ./build -G Ninja ./llvm \
+  -C ./clang/cmake/caches/BOLT-PGO.cmake \
+  -DBOOTSTRAP_LLVM_ENABLE_LLD=ON \
+  -DBOOTSTRAP_BOOTSTRAP_LLVM_ENABLE_LLD=ON \
+  -DPGO_INSTRUMENT_LTO=Thin \
+  -DLLVM_ENABLE_RUNTIMES="compiler-rt" \
+  -DCMAKE_INSTALL_PREFIX="$LLVM_SYSROOT" \
+  -DLLVM_ENABLE_PROJECTS="bolt;clang;lld;clang-tools-extra" \
+  -DLLVM_DISTRIBUTION_COMPONENTS="lld;compiler-rt;clang-format" \
+  -DCLANG_DEFAULT_LINKER="lld" \
+  -DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/llvm-project-llvmorg-$LLVM_VERSION/llvm
+
+RUN ninja -C ./build stage2-instrumented-clang stage2-instrumented-lld
diff --git a/.github/workflows/containers/github-action-ci/stage2.Dockerfile b/.github/workflows/containers/github-action-ci/stage2.Dockerfile
new file mode 100644
index 0000000000000..e1a06cb68a589
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/stage2.Dockerfile
@@ -0,0 +1,27 @@
+FROM docker.io/library/ubuntu:22.04 as base
+ENV LLVM_SYSROOT=/opt/llvm
+
+FROM stage1-toolchain AS stage2-toolchain
+
+RUN ninja -C ./build stage2-clang-bolt stage2-install-distribution && ninja -C ./build install-distribution && rm -rf ./build
+
+FROM base
+
+COPY --from=stage2-toolchain $LLVM_SYSROOT $LLVM_SYSROOT
+
+# Need to install curl for hendrikmuhs/ccache-action
+# Need nodejs for some of the GitHub actions.
+# Need perl-modules for clang analyzer tests.
+RUN apt-get update && \
+    apt-get install -y \
+    binutils \
+    cmake \
+    curl \
+    libstdc++-11-dev \
+    ninja-build \
+    nodejs \
+    perl-modules \
+    python3-psutil
+
+ENV LLVM_SYSROOT=$LLVM_SYSROOT
+ENV PATH=${LLVM_SYSROOT}/bin:${PATH}
diff --git a/.github/workflows/containers/github-action-ci/storage.conf b/.github/workflows/containers/github-action-ci/storage.conf
new file mode 100644
index 0000000000000..60f295ff1e969
--- /dev/null
+++ b/.github/workflows/containers/github-action-ci/storage.conf
@@ -0,0 +1,4 @@
+[storage]
+  driver = "overlay"
+  runroot = "/mnt/podman/container"
+  graphroot = "/mnt/podman/image"

@boomanaiden154 boomanaiden154 removed the clang Clang issues not falling into any other category label Feb 5, 2024
@boomanaiden154
Copy link
Contributor Author

Looks like there was a transient failure in the stage2 build while loading the container image. I've restarted it and everything seems to be going fine. Hopefully that's not an issue going forward.

@boomanaiden154 boomanaiden154 merged commit 8f80df0 into llvm:main Feb 6, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants