-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revive Nightly/Past CI #31159
Revive Nightly/Past CI #31159
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,16 +15,6 @@ jobs: | |
name: "Nightly PyTorch + Stable TensorFlow" | ||
runs-on: [intel-cpu, 8-cpu, ci] | ||
steps: | ||
- name: Cleanup disk | ||
run: | | ||
sudo ls -l /usr/local/lib/ | ||
sudo ls -l /usr/share/ | ||
sudo du -sh /usr/local/lib/ | ||
sudo du -sh /usr/share/ | ||
sudo rm -rf /usr/local/lib/android | ||
sudo rm -rf /usr/share/dotnet | ||
sudo du -sh /usr/local/lib/ | ||
sudo du -sh /usr/share/ | ||
- | ||
name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v2 | ||
|
@@ -52,16 +42,6 @@ jobs: | |
name: "Nightly PyTorch + DeepSpeed" | ||
runs-on: [intel-cpu, 8-cpu, ci] | ||
steps: | ||
- name: Cleanup disk | ||
run: | | ||
sudo ls -l /usr/local/lib/ | ||
sudo ls -l /usr/share/ | ||
sudo du -sh /usr/local/lib/ | ||
sudo du -sh /usr/share/ | ||
sudo rm -rf /usr/local/lib/android | ||
sudo rm -rf /usr/share/dotnet | ||
sudo du -sh /usr/local/lib/ | ||
sudo du -sh /usr/share/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
- | ||
name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v2 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,12 @@ on: | |
slice_id: | ||
required: true | ||
type: number | ||
runner: | ||
required: true | ||
type: string | ||
docker: | ||
required: true | ||
type: string | ||
Comment on lines
+15
to
+20
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just to allow using different runner and docker image. (eventually, we probably move all runner to |
||
|
||
env: | ||
HF_HOME: /mnt/cache | ||
|
@@ -31,12 +37,13 @@ jobs: | |
run_models_gpu: | ||
name: " " | ||
strategy: | ||
max-parallel: 8 | ||
fail-fast: false | ||
matrix: | ||
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }} | ||
runs-on: ['${{ inputs.machine_type }}', nvidia-gpu, t4, daily-ci] | ||
runs-on: ['${{ inputs.machine_type }}', nvidia-gpu, t4, '${{ inputs.runner }}'] | ||
container: | ||
image: huggingface/transformers-all-latest-gpu | ||
image: ${{ inputs.docker }} | ||
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ | ||
steps: | ||
- name: Echo input and matrix info | ||
|
@@ -65,6 +72,18 @@ jobs: | |
working-directory: /transformers | ||
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e . | ||
|
||
- name: Update / Install some packages (for Past CI) | ||
if: ${{ contains(inputs.docker, '-past-') }} | ||
working-directory: /transformers | ||
run: | | ||
python3 -m pip install -U datasets | ||
|
||
- name: Update / Install some packages (for Past CI) | ||
if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }} | ||
working-directory: /transformers | ||
run: | | ||
python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate | ||
Comment on lines
+75
to
+85
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. some particularity for past ci |
||
|
||
- name: NVIDIA-SMI | ||
run: | | ||
nvidia-smi | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just call some jobs defined in the (common) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
name: Self-hosted runner (nightly-ci) | ||
|
||
|
||
on: | ||
repository_dispatch: | ||
schedule: | ||
- cron: "17 2 * * *" | ||
push: | ||
branches: | ||
- run_nightly_ci* | ||
|
||
jobs: | ||
build_nightly_ci_images: | ||
name: Build Nightly CI Docker Images | ||
if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'build-cleanup-docker-build')) | ||
uses: ./.github/workflows/build-nightly-ci-docker-images.yml | ||
secrets: inherit | ||
|
||
model-ci: | ||
name: Model CI | ||
needs: [build_nightly_ci_images] | ||
uses: ./.github/workflows/self-scheduled.yml | ||
with: | ||
job: run_models_gpu | ||
slack_report_channel: "#transformers-ci-past-future" | ||
runner: ci | ||
docker: huggingface/transformers-all-latest-torch-nightly-gpu | ||
ci_event: Nightly CI | ||
secrets: inherit | ||
|
||
deepspeed-ci: | ||
name: DeepSpeed CI | ||
needs: [build_nightly_ci_images] | ||
uses: ./.github/workflows/self-scheduled.yml | ||
with: | ||
job: run_torch_cuda_extensions_gpu | ||
slack_report_channel: "#transformers-ci-past-future" | ||
runner: ci | ||
# test deepspeed nightly build with the latest release torch | ||
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu | ||
ci_event: Nightly CI | ||
working-directory-prefix: /workspace | ||
secrets: inherit |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,32 +2,30 @@ name: Self-hosted runner (nightly-past-ci-caller) | |
|
||
on: | ||
schedule: | ||
# 2:17 am on each Sunday and Thursday | ||
|
||
- cron: "17 2 * * 0,4" | ||
- cron: "17 2,14 * * *" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. trigger twice per day There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why twice a day? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The objective is simply to have the runs of all past versions finished in one weak. And each run could be done within 12 hours, so that's why I put twice a day. |
||
push: | ||
branches: | ||
- run_nightly_ci* | ||
- run_past_ci* | ||
|
||
jobs: | ||
build_nightly_ci_images: | ||
name: Build Nightly CI Docker Images | ||
if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_nightly_ci')) | ||
uses: ./.github/workflows/build-nightly-ci-docker-images.yml | ||
secrets: inherit | ||
|
||
run_nightly_ci: | ||
name: Nightly CI | ||
needs: [build_nightly_ci_images] | ||
uses: ./.github/workflows/self-nightly-scheduled.yml | ||
secrets: inherit | ||
get_number: | ||
name: Get number | ||
runs-on: ubuntu-22.04 | ||
outputs: | ||
run_number: ${{ steps.get_number.outputs.run_number }} | ||
steps: | ||
- name: Get number | ||
id: get_number | ||
run: | | ||
echo "${{ github.run_number }}" | ||
echo "$(python3 -c 'print(int(${{ github.run_number }}) % 10)')" | ||
echo "run_number=$(python3 -c 'print(int(${{ github.run_number }}) % 10)')" >> $GITHUB_OUTPUT | ||
Comment on lines
+11
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. an index to determine which torch/tf version to run |
||
|
||
run_past_ci_pytorch_1-13: | ||
name: PyTorch 1.13 | ||
if: (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) | ||
needs: [run_nightly_ci] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 0 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'build-cleanup-docker-build'))) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: pytorch | ||
version: "1.13" | ||
|
@@ -36,9 +34,9 @@ jobs: | |
|
||
run_past_ci_pytorch_1-12: | ||
name: PyTorch 1.12 | ||
if: (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) | ||
needs: [run_past_ci_pytorch_1-13] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 1 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'build-cleanup-docker-build'))) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: pytorch | ||
version: "1.12" | ||
|
@@ -47,9 +45,9 @@ jobs: | |
|
||
run_past_ci_pytorch_1-11: | ||
name: PyTorch 1.11 | ||
if: (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci'))) | ||
needs: [run_past_ci_pytorch_1-12] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 2 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'build-cleanup-docker-build'))) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: pytorch | ||
version: "1.11" | ||
|
@@ -58,9 +56,9 @@ jobs: | |
|
||
run_past_ci_tensorflow_2-11: | ||
name: TensorFlow 2.11 | ||
if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
needs: [run_past_ci_pytorch_1-11] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 3 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: tensorflow | ||
version: "2.11" | ||
|
@@ -69,9 +67,9 @@ jobs: | |
|
||
run_past_ci_tensorflow_2-10: | ||
name: TensorFlow 2.10 | ||
if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
needs: [run_past_ci_tensorflow_2-11] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 4 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: tensorflow | ||
version: "2.10" | ||
|
@@ -80,9 +78,9 @@ jobs: | |
|
||
run_past_ci_tensorflow_2-9: | ||
name: TensorFlow 2.9 | ||
if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
needs: [run_past_ci_tensorflow_2-10] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 5 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: tensorflow | ||
version: "2.9" | ||
|
@@ -91,9 +89,9 @@ jobs: | |
|
||
run_past_ci_tensorflow_2-8: | ||
name: TensorFlow 2.8 | ||
if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
needs: [run_past_ci_tensorflow_2-9] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 6 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: tensorflow | ||
version: "2.8" | ||
|
@@ -102,9 +100,9 @@ jobs: | |
|
||
run_past_ci_tensorflow_2-7: | ||
name: TensorFlow 2.7 | ||
if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
needs: [run_past_ci_tensorflow_2-8] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 7 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: tensorflow | ||
version: "2.7" | ||
|
@@ -113,9 +111,9 @@ jobs: | |
|
||
run_past_ci_tensorflow_2-6: | ||
name: TensorFlow 2.6 | ||
if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
needs: [run_past_ci_tensorflow_2-7] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 8 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: tensorflow | ||
version: "2.6" | ||
|
@@ -124,9 +122,9 @@ jobs: | |
|
||
run_past_ci_tensorflow_2-5: | ||
name: TensorFlow 2.5 | ||
if: (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
needs: [run_past_ci_tensorflow_2-6] | ||
uses: ./.github/workflows/self-past.yml | ||
needs: get_number | ||
if: needs.get_number.outputs.run_number == 9 && (cancelled() != true) && ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')) | ||
uses: ./.github/workflows/self-past-caller.yml | ||
with: | ||
framework: tensorflow | ||
version: "2.5" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this anymore after using
[intel-cpu, 8-cpu, ci]
. I forgot to remove this in #31119