Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make smoke tests more lightweight #673

Merged
merged 16 commits into from
Sep 19, 2022
Merged

Make smoke tests more lightweight #673

merged 16 commits into from
Sep 19, 2022

Conversation

tasdomas
Copy link
Contributor

Currently smoke tests call tpi to provision machines with GPUs. This often runs into resource quotas.
The proposal runs PR tests provisioning smaller (m) machines and a test to provision machines with GPUS is
run on a fixed daily schedule on master.

task/task_smoke_test.go Outdated Show resolved Hide resolved
.github/workflows/gpu-smoke-test.yml Outdated Show resolved Hide resolved
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 12:08 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 12:08 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 12:08 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 12:08 Inactive
Copy link
Member

@0x2b3bfa0 0x2b3bfa0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stretch goal: make also spot instance testing optional

Spot: common.SpotEnabled,

task/task_smoke_test.go Outdated Show resolved Hide resolved
Use single workflow.
Use env var to toggle gpu test.
Add parameter for using spot instances.
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 13:06 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 13:07 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 13:07 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 16, 2022 13:07 Inactive
@tasdomas tasdomas requested a review from 0x2b3bfa0 September 16, 2022 13:28
Copy link
Contributor

@dacbd dacbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

.github/workflows/smoke.yml Outdated Show resolved Hide resolved
Co-authored-by: Daniel Barnes <dabarnes2b@gmail.com>
task/task_smoke_test.go Outdated Show resolved Hide resolved
task/task_smoke_test.go Outdated Show resolved Hide resolved
@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Sep 16, 2022

Note that k8s tests will always run on a GPU-enabled cluster; to avoid this, you may want to modify the following lines:

az aks create \
--resource-group="tpiSmokeTestCluster$GITHUB_RUN_ID" \
--name="tpiSmokeTestCluster$GITHUB_RUN_ID" \
--node-vm-size=Standard_NC6 \
--node-count=1 \
--aks-custom-headers=UseGPUDedicatedVHD=true \
--generate-ssh-keys

See the collapsible at https://registry.terraform.io/providers/iterative/iterative/latest/docs/guides/testing-kubernetes#creating-a-test-cluster for more information:

az aks create \
--resource-group testKubernetesResourceGroup \
--name testKubernetesCluster \
--node-vm-size Standard_A2_v2 \
--node-count 1

Co-authored-by: Helio Machado <0x2b3bfa0+git@googlemail.com>
@tasdomas tasdomas temporarily deployed to automatic September 19, 2022 13:38 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 19, 2022 13:39 Inactive
@tasdomas tasdomas temporarily deployed to automatic September 19, 2022 13:39 Inactive
Copy link
Member

@0x2b3bfa0 0x2b3bfa0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome to me! 🎉

task/task_smoke_test.go Outdated Show resolved Hide resolved
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic September 19, 2022 15:13 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic September 19, 2022 15:13 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic September 19, 2022 15:13 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic September 19, 2022 15:13 Inactive
@0x2b3bfa0 0x2b3bfa0 temporarily deployed to automatic September 19, 2022 15:13 Inactive
@0x2b3bfa0 0x2b3bfa0 merged commit 7461e69 into master Sep 19, 2022
@0x2b3bfa0 0x2b3bfa0 deleted the d020-lightweight-tests branch September 19, 2022 16:58
@casperdcl casperdcl added technical-debt Refactoring, linting & tidying testing Unit tests & debugging labels Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
technical-debt Refactoring, linting & tidying testing Unit tests & debugging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants