Skip to content

Commit

Permalink
Assign owners to periodic and slow jobs (pytorch#139519)
Browse files Browse the repository at this point in the history
As an outcome of https://fburl.com/gdoc/voce5o06, I want to assign owner(s) to any periodic or slows job that are still needed but couldn't run more frequently (too $$$, capacity constraint, don't fail that often).  They include:

* multigpu
* debug build
* ROCm (distributed, slow)

@malfet @soulitzer I put down your names as the owners of debug build and slowgradcheck respectively.  Please let me know if you are ok with that, or if you have a better option in mind.

Any jobs there without an owner are owned by us (PT Dev Infra)

### Testing

The owners are show up in the job name https://hud.pytorch.org/pr/139519
Pull Request resolved: pytorch#139519
Approved by: https://github.com/malfet
  • Loading branch information
huydhn authored and pytorchmergebot committed Nov 5, 2024
1 parent a766d84 commit fdfd4c5
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 23 deletions.
26 changes: 13 additions & 13 deletions .github/workflows/periodic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ jobs:
cuda-arch-list: 8.6
test-matrix: |
{ include: [
{ config: "multigpu", shard: 1, num_shards: 1, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.12xlarge.nvidia.gpu" },
{ config: "multigpu", shard: 1, num_shards: 1, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.12xlarge.nvidia.gpu", owners: ["oncall:distributed"] },
]}
build-with-debug: false

Expand All @@ -145,11 +145,11 @@ jobs:
build-with-debug: true
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "default", shard: 2, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "default", shard: 3, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "default", shard: 4, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "default", shard: 5, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu" },
{ config: "default", shard: 1, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu", owners: ["oncall:debug-build"] },
{ config: "default", shard: 2, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu", owners: ["oncall:debug-build"] },
{ config: "default", shard: 3, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu", owners: ["oncall:debug-build"] },
{ config: "default", shard: 4, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu", owners: ["oncall:debug-build"] },
{ config: "default", shard: 5, num_shards: 5, runner: "${{ needs.get-label-type.outputs.label-type }}linux.4xlarge.nvidia.gpu", owners: ["oncall:debug-build"] },
]}
linux-focal-cuda11_8-py3_10-gcc9-debug-test:
Expand Down Expand Up @@ -183,9 +183,9 @@ jobs:
docker-image-name: pytorch-linux-focal-rocm-n-py3
test-matrix: |
{ include: [
{ config: "distributed", shard: 1, num_shards: 3, runner: "linux.rocm.gpu" },
{ config: "distributed", shard: 2, num_shards: 3, runner: "linux.rocm.gpu" },
{ config: "distributed", shard: 3, num_shards: 3, runner: "linux.rocm.gpu" },
{ config: "distributed", shard: 1, num_shards: 3, runner: "linux.rocm.gpu", owners: ["module:rocm", "oncall:distributed"] },
{ config: "distributed", shard: 2, num_shards: 3, runner: "linux.rocm.gpu", owners: ["module:rocm", "oncall:distributed"] },
{ config: "distributed", shard: 3, num_shards: 3, runner: "linux.rocm.gpu", owners: ["module:rocm", "oncall:distributed"] },
]}
linux-focal-rocm6_2-py3_10-test:
Expand Down Expand Up @@ -246,7 +246,7 @@ jobs:
cuda-arch-list: 8.6
test-matrix: |
{ include: [
{ config: "multigpu", shard: 1, num_shards: 1, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.12xlarge.nvidia.gpu" },
{ config: "multigpu", shard: 1, num_shards: 1, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g5.12xlarge.nvidia.gpu", owners: ["oncall:distributed"] },
]}
build-with-debug: false

Expand Down Expand Up @@ -274,9 +274,9 @@ jobs:
cuda-arch-list: '7.5'
test-matrix: |
{ include: [
{ config: "distributed", shard: 1, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g4dn.12xlarge.nvidia.gpu" },
{ config: "distributed", shard: 2, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g4dn.12xlarge.nvidia.gpu" },
{ config: "distributed", shard: 3, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g4dn.12xlarge.nvidia.gpu" },
{ config: "distributed", shard: 1, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g4dn.12xlarge.nvidia.gpu", owners: ["oncall:distributed"] },
{ config: "distributed", shard: 2, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g4dn.12xlarge.nvidia.gpu", owners: ["oncall:distributed"] },
{ config: "distributed", shard: 3, num_shards: 3, runner: "${{ needs.get-label-type.outputs.label-type }}linux.g4dn.12xlarge.nvidia.gpu", owners: ["oncall:distributed"] },
]}
linux-focal-cuda11_8-py3_10-gcc9-experimental-split-build-test:
Expand Down
20 changes: 10 additions & 10 deletions .github/workflows/slow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,14 @@ jobs:
cuda-arch-list: 8.6
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 2, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 3, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 4, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 5, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 6, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 7, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 8, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu" },
{ config: "default", shard: 1, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
{ config: "default", shard: 2, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
{ config: "default", shard: 3, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
{ config: "default", shard: 4, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
{ config: "default", shard: 5, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
{ config: "default", shard: 6, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
{ config: "default", shard: 7, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
{ config: "default", shard: 8, num_shards: 8, runner: "linux.g5.4xlarge.nvidia.gpu", owners: ["module:slowgradcheck"] },
]}
linux-focal-cuda12_1-py3-gcc9-slow-gradcheck-test:
Expand Down Expand Up @@ -142,8 +142,8 @@ jobs:
docker-image-name: pytorch-linux-focal-rocm-n-py3
test-matrix: |
{ include: [
{ config: "slow", shard: 1, num_shards: 2, runner: "linux.rocm.gpu" },
{ config: "slow", shard: 2, num_shards: 2, runner: "linux.rocm.gpu" },
{ config: "slow", shard: 1, num_shards: 2, runner: "linux.rocm.gpu", owners: ["module:rocm"] },
{ config: "slow", shard: 2, num_shards: 2, runner: "linux.rocm.gpu", owners: ["module:rocm"] },
]}
linux-focal-rocm6_2-py3_10-test:
Expand Down

0 comments on commit fdfd4c5

Please sign in to comment.