Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Add AWS integration test workflow, clean up #1977

Merged
merged 50 commits into from
Sep 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
8fbde6f
Add region to init cmd, update tests_integrations
iameskild Aug 22, 2023
cc25b9a
Clean up
iameskild Aug 22, 2023
cadc0d5
Shuffle functions
iameskild Aug 23, 2023
ba4791e
Ensure region is handled carefully, set default values
iameskild Aug 23, 2023
96b46b4
Clean up kubernetes_version
iameskild Aug 23, 2023
a4d17e2
Clean up
iameskild Aug 23, 2023
cc88f8a
Make region eager
iameskild Aug 23, 2023
272b9f6
Merge branch 'develop' into 20230822
iameskild Aug 23, 2023
649bb77
Merge branch 'develop' into 20230822
iameskild Aug 24, 2023
4418a5b
Remove on_cloud for --cloud instead
iameskild Aug 24, 2023
2193664
Make project required
iameskild Aug 24, 2023
ed95606
Update tests_integration README
iameskild Aug 24, 2023
8e54303
Update cli_init tests to include region
iameskild Aug 24, 2023
b9a1c5a
Merge branch 'develop' into 20230822
iameskild Aug 24, 2023
561dedf
Add more robust clean up for integration tests
iameskild Aug 26, 2023
bcd0717
Add azure-mgmt-resource as a dependency
iameskild Aug 26, 2023
e26d371
Handle test failure gracefully
iameskild Aug 26, 2023
38d9d0b
Minor updates
iameskild Aug 30, 2023
b75d1c6
Merge branch 'develop' into 20230822
iameskild Aug 30, 2023
0bb61a6
Merge branch 'develop' into 20230822
iameskild Sep 1, 2023
08f5e51
Merge develop
iameskild Sep 4, 2023
b1ab4c1
Add azure storge_account_postfix to initialize
iameskild Sep 4, 2023
7a60401
Remove empty nb
iameskild Sep 4, 2023
0f5692a
Update azure cli validate test
iameskild Sep 4, 2023
104373c
Add AWS integration test workflow, clean up
iameskild Sep 4, 2023
46000f4
Fix aws region based on review
iameskild Sep 6, 2023
b932954
Merge branch 'develop' into 20230822
iameskild Sep 6, 2023
22252fa
Clean up
iameskild Sep 6, 2023
570f12a
Handle AWS invalid region by exiting
iameskild Sep 6, 2023
9741108
Merge branch '20230822' into it_aws
iameskild Sep 7, 2023
6c4cfd3
Merge develop
iameskild Sep 12, 2023
62d7305
Clean up gcp validator
iameskild Sep 12, 2023
8c750fb
Merge branch 'develop' into it_aws
iameskild Sep 12, 2023
3f69977
Merge branch 'develop' into it_aws
iameskild Sep 13, 2023
dff9814
Test AWS IT
iameskild Sep 13, 2023
94343b7
Remove extra quotes
iameskild Sep 13, 2023
61b1d95
Remove duplicate default
iameskild Sep 13, 2023
a310d62
Add |
iameskild Sep 13, 2023
363323d
Replace with with env
iameskild Sep 13, 2023
54279b9
Add default values for envs
iameskild Sep 13, 2023
7d22dca
Set env correctly
iameskild Sep 13, 2023
82bc43d
Add region arg to kubernetes_versions
iameskild Sep 13, 2023
d88b2be
Test on this branch
iameskild Sep 13, 2023
20f86f3
Add tf_objects to terraform_state for aws
iameskild Sep 14, 2023
b968a2b
Merge branch 'develop' into it_aws
iameskild Sep 14, 2023
0b2fed5
GPU xfail due to timeout error
iameskild Sep 14, 2023
97f7bf2
Comment out GPU test
iameskild Sep 14, 2023
a3428b4
Remove comments
iameskild Sep 14, 2023
93b9be2
Add note in test_gpu.py
iameskild Sep 14, 2023
7a0e391
Merge branch 'develop' into it_aws
iameskild Sep 14, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions .github/workflows/test_aws_integration.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
name: test-aws-integration

on:
schedule:
- cron: "0 0 * * MON"
workflow_dispatch:
inputs:
branch:
description: 'Nebari branch to deploy, test, destroy'
required: true
default: develop
type: string
image-tag:
description: 'Nebari image tag created by the nebari-docker-images repo'
required: true
default: main
type: string
tf-log-level:
description: 'Change Terraform log levels'
required: false
default: info
type: choice
options:
- info
- warn
- debug
- trace
- error


env:
AWS_DEFAULT_REGION: "us-west-2"
NEBARI_GH_BRANCH: ${{ github.event.inputs.branch || 'develop' }}
NEBARI_IMAGE_TAG: ${{ github.event.inputs.image-tag || 'main' }}
TF_LOG: ${{ github.event.inputs.tf-log-level || 'info' }}


jobs:
test-aws-integration:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v3
with:
ref: ${{ env.NEBARI_GH_BRANCH }}
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.11

- name: Install Nebari
run: |
pip install .[dev]
conda install --quiet --yes conda-build
playwright install

- name: Retrieve secret from Vault
uses: hashicorp/vault-action@v2.5.0
with:
method: jwt
url: "https://quansight-vault-public-vault-b2379fa7.d415e30e.z1.hashicorp.cloud:8200"
namespace: "admin/quansight"
role: "repository-nebari-dev-nebari-role"
secrets: |
kv/data/repository/nebari-dev/nebari/amazon_web_services/nebari-dev-ci role_name | AWS_ROLE_ARN;
kv/data/repository/nebari-dev/nebari/cloudflare/internal-devops@quansight.com/nebari-dev-ci token | CLOUDFLARE_TOKEN;

- name: Authenticate to AWS
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: ${{ env.AWS_ROLE_ARN }}
role-session-name: github-action
aws-region: ${{ env.AWS_DEFAULT_REGION }}

- name: Integration Tests
run: |
pytest --version
pytest tests/tests_integration/ -vvv -s --cloud aws
env:
NEBARI_SECRET__default_images__jupyterhub: "quay.io/nebari/nebari-jupyterhub:${{ env.NEBARI_IMAGE_TAG }}"
NEBARI_SECRET__default_images__jupyterlab: "quay.io/nebari/nebari-jupyterlab:${{ env.NEBARI_IMAGE_TAG }}"
NEBARI_SECRET__default_images__dask_worker: "quay.io/nebari/nebari-dask-worker:${{ env.NEBARI_IMAGE_TAG }}"
91 changes: 0 additions & 91 deletions .github/workflows/test_integration.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion src/_nebari/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def deploy_configuration(
stack.enter_context(s.deploy(stage_outputs, disable_prompt))

if not disable_checks:
s.check(stage_outputs)
s.check(stage_outputs, disable_prompt)
print("Nebari deployed successfully")

print("Services:")
Expand Down
2 changes: 1 addition & 1 deletion src/_nebari/initialize.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def render_config(
or constants.AWS_DEFAULT_REGION
)
aws_kubernetes_version = kubernetes_version or get_latest_kubernetes_version(
amazon_web_services.kubernetes_versions()
amazon_web_services.kubernetes_versions(aws_region)
)
config["amazon_web_services"] = {
"kubernetes_version": aws_kubernetes_version,
Expand Down
2 changes: 1 addition & 1 deletion src/_nebari/provider/cloud/amazon_web_services.py
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ def aws_delete_efs_file_system(efs_id: str, region: str):

def aws_delete_efs(name: str, namespace: str, region: str):
"""Delete EFS resources for the EKS cluster named `{name}-{namespace}`."""
efs_ids = aws_get_efs_ids(name, namespace)
efs_ids = aws_get_efs_ids(name, namespace, region=region)
for efs_id in efs_ids:
aws_delete_efs_mount_targets(efs_id, region=region)
aws_delete_efs_file_system(efs_id, region=region)
Expand Down
2 changes: 1 addition & 1 deletion src/_nebari/render.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ def list_files(
if source_files[prevalent_file] != output_files[prevalent_file]:
updated_files.add(prevalent_file)

return new_files, untracted_files, updated_files, deleted_paths
return new_files, untracted_files, updated_files, deleted_files


def hash_file(file_path: str):
Expand Down
1 change: 1 addition & 0 deletions src/_nebari/stages/infrastructure/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,7 @@ class AzureProvider(schema.Base):
"user": AzureNodeGroup(instance="Standard_D4_v3", min_nodes=0, max_nodes=5),
"worker": AzureNodeGroup(instance="Standard_D4_v3", min_nodes=0, max_nodes=5),
}
storage_account_postfix: str
vnet_subnet_id: typing.Optional[typing.Union[str, None]] = None
private_cluster_enabled: bool = False
resource_group_name: typing.Optional[str] = None
Expand Down
10 changes: 9 additions & 1 deletion src/_nebari/stages/terraform_state/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

import pydantic

from _nebari.provider import terraform
from _nebari.provider.cloud import azure_cloud
from _nebari.stages.base import NebariTerraformStage
from _nebari.utils import (
Expand Down Expand Up @@ -168,7 +169,14 @@ def state_imports(self) -> List[Tuple[str, str]]:
return []

def tf_objects(self) -> List[Dict]:
return []
if self.config.provider == schema.ProviderEnum.aws:
return [
terraform.Provider(
"aws", region=self.config.amazon_web_services.region
),
]
else:
return []

def input_vars(self, stage_outputs: Dict[str, Dict[str, Any]]):
if self.config.provider == schema.ProviderEnum.do:
Expand Down
10 changes: 6 additions & 4 deletions src/_nebari/subcommands/init.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,10 @@
"It is an [i]alternative[/i] to passing the options listed below."
)

DEFAULT_REGION_MSG = "Defaulting to region:`{region}`."

DEFAULT_KUBERNETES_VERSION_MSG = (
"Defaulting to latest `{kubernetes_version}` Kubernetes version available."
"Defaulting to highest supported Kubernetes version: `{kubernetes_version}`."
)

LATEST = "latest"
Expand Down Expand Up @@ -430,19 +432,19 @@ def check_cloud_provider_region(region: str, cloud_provider: str) -> str:
# TODO: Add a check for valid region for Azure
if not region:
region = AZURE_DEFAULT_REGION
rich.print(f"Defaulting to `{region}` region.")
rich.print(DEFAULT_REGION_MSG.format(region=region))
elif cloud_provider == ProviderEnum.gcp.value.lower():
if not region:
region = GCP_DEFAULT_REGION
rich.print(f"Defaulting to `{region}` region.")
rich.print(DEFAULT_REGION_MSG.format(region=region))
if region not in google_cloud.regions(os.environ["PROJECT_ID"]):
raise ValueError(
f"Invalid region `{region}`. Please refer to the GCP docs for a list of valid regions: {GCP_REGIONS}"
)
elif cloud_provider == ProviderEnum.do.value.lower():
if not region:
region = DO_DEFAULT_REGION
rich.print(f"Defaulting to `{region}` region.")
rich.print(DEFAULT_REGION_MSG.format(region=region))

if region not in set(_["slug"] for _ in digital_ocean.regions()):
raise ValueError(
Expand Down
5 changes: 3 additions & 2 deletions tests/common/config_mod_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

from _nebari.stages.infrastructure import AWSNodeGroup, GCPNodeGroup
from _nebari.stages.kubernetes_services import (
AccessEnum,
CondaEnvironment,
JupyterLabProfile,
KubeSpawner,
Expand Down Expand Up @@ -104,8 +105,8 @@ def add_gpu_config(config, cloud="aws"):
jupyterlab_profile = JupyterLabProfile(
display_name="GPU Instance",
description="4 CPU / 16GB RAM / 1 NVIDIA T4 GPU (16 GB GPU RAM)",
access="yaml",
groups=["gpu-access"],
access=AccessEnum.all,
groups=None,
kubespawner_override=kubespawner_overrides,
)

Expand Down
6 changes: 0 additions & 6 deletions tests/tests_integration/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,3 @@ def pytest_addoption(parser):
parser.addoption(
"--cloud", action="store", help="Cloud to deploy on: aws/do/gcp/azure"
)
parser.addoption(
"--disable-prompt",
action="store_true",
help="Disable prompt for confirmation to start cluster teardown",
default=False,
)
8 changes: 2 additions & 6 deletions tests/tests_integration/deployment_fixtures.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
import os
import pprint
import random
import shutil
import string
Expand Down Expand Up @@ -114,7 +115,6 @@ def deploy(request):
"""Deploy Nebari on the given cloud."""
ignore_warnings()
cloud = request.config.getoption("--cloud")
disable_prompt = request.config.getoption("--disable-prompt")

# initialize
if cloud == "do":
Expand Down Expand Up @@ -164,10 +164,8 @@ def deploy(request):
config = add_gpu_config(config, cloud=cloud)
config = add_preemptible_node_group(config, cloud=cloud)

from pprint import pprint

print("*" * 100)
pprint(config.dict())
pprint.pprint(config.dict())
print("*" * 100)

# render
Expand All @@ -194,8 +192,6 @@ def deploy(request):
logger.exception(e)
logger.error(f"Deploy Failed, Exception: {e}")

disable_prompt or input("\n[Press Enter] to continue...\n")

# destroy
try:
logger.info("*" * 100)
Expand Down
40 changes: 21 additions & 19 deletions tests/tests_integration/test_gpu.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,26 @@
import re
# 2023-09-14: This test is currently timing out on CI, so we're disabling it for now.

import pytest
# import re
iameskild marked this conversation as resolved.
Show resolved Hide resolved

from tests.common.playwright_fixtures import navigator_parameterized
from tests.common.run_notebook import Notebook
# import pytest

# from tests.common.playwright_fixtures import navigator_parameterized
# from tests.common.run_notebook import Notebook

@pytest.mark.gpu
@navigator_parameterized(instance_name="gpu-instance")
def test_gpu(deploy, navigator, test_data_root):
test_app = Notebook(navigator=navigator)
conda_env = "gpu"
test_app.create_notebook(
conda_env=f"conda-env-nebari-git-nebari-git-{conda_env}-py"
)
test_app.assert_code_output(
code="!nvidia-smi",
expected_output=re.compile(".*\n.*\n.*NVIDIA-SMI.*CUDA Version"),
)

test_app.assert_code_output(
code="import torch;torch.cuda.is_available()", expected_output="True"
)
# @pytest.mark.gpu
# @navigator_parameterized(instance_name="gpu-instance")
# def test_gpu(deploy, navigator, test_data_root):
# test_app = Notebook(navigator=navigator)
# conda_env = "gpu"
# test_app.create_notebook(
# conda_env=f"conda-env-nebari-git-nebari-git-{conda_env}-py"
# )
# test_app.assert_code_output(
# code="!nvidia-smi",
# expected_output=re.compile(".*\n.*\n.*NVIDIA-SMI.*CUDA Version"),
# )

# test_app.assert_code_output(
# code="import torch;torch.cuda.is_available()", expected_output="True"
# )
Loading