Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Init Support for Kuberflow/Testing #755

Merged
merged 14 commits into from
Oct 10, 2020

Conversation

PatrickXYS
Copy link
Member

@PatrickXYS PatrickXYS commented Sep 10, 2020

Which issue is resolved by this Pull Request:
Resolves #748

Description of your changes:

  1. Modify run_workflows.sh and run_e2e_workflow.py to make them AWS compatible.
  2. Add Dockerfile.py3.aws for AWS ECR Image.
  3. Add images/aws-scripts folder to place AWS related scripts.
  4. Add cloudprovider/aws folder to place AWS related python files.
    Since we added import boto3 in util.py, GCP also need to install boto3 in the Dockerfile. Thus, modifies Dockerfile and Dockerfile.py3

@kubeflow-bot
Copy link

This change is Reviewable

@google-cla google-cla bot added the cla: yes label Sep 10, 2020
@PatrickXYS PatrickXYS force-pushed the aws_init_support branch 4 times, most recently from 89be21f to 29947a4 Compare September 10, 2020 15:42
@PatrickXYS
Copy link
Member Author

/cc @Jeffwan @jlewi @rmgogogo

@PatrickXYS PatrickXYS changed the title Aws Init Support for Kuberflow/Testing AWS Init Support for Kuberflow/Testing Sep 10, 2020
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PatrickXYS Thanks a lot for doing this!
I left few comments

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Sep 16, 2020

Per presubmit job logs:

...
File "/src/kubeflow/testing/py/kubeflow/testing/util.py", line 27, in <module> import boto3 ImportError: No module named boto3
...

To make kubeflow-testing-presubmit succeed, we need GCP folks to first send PR to build new container image with boto3 installed.

This is TBD for now. Will pick this up once we all agree to move forward and merge this PR.

@jlewi
Copy link
Contributor

jlewi commented Sep 18, 2020

To make kubeflow-testing-presubmit succeed, we need GCP folks to first send PR to build new container image with boto3 installed.

The folks listed in: https://github.com/kubeflow/internal-acls/blob/master/google_groups/ci-team.members.txt

Should be able to help with that

@jlewi
Copy link
Contributor

jlewi commented Sep 18, 2020

Although

File "/src/kubeflow/testing/py/kubeflow/testing/util.py", line 27, in <module> import boto3 ImportError: No module named boto3
...

Can we avoid this by organizing the code so that we don't have to pull in platform/cloud specific libraries when not running on that platform?

Could we create different subpackages for different clouds that can be selectively imported?

e.g. 

py/kubeflow/testing/aws


This will also make code review easier since we can have separate OWNERs files


@PatrickXYS
Copy link
Member Author

PatrickXYS commented Sep 29, 2020

Per the discussion in Kubeflow Test-infra on AWS, if Google will host Prow layer, and AWS host CI layer, there's no need to create and upload started.json + finished.json to S3.

If so, the step which differs from GCP CI-infra, is:

1. Configure AWS credential and load AWS CI cluster's kubeconfig
2. Ksonnet generate and apply Argo Workflow to AWS CI cluster.

Besides that, we need to create secret in GCP Prow cluster to store AWS credentials.

I misunderstood at the beginning.

This is summary of the proposal:

From AWS side, we have:

1. Bot account -> be granted with permissions to some repos.
2. Bot Webhook -> set up in some repo’s setting and be able to receive any github event
3. Prow cluster -> everything without Tide installed.
4. CI cluster -> Installed with Argo

In any of PR, specific tests are running by AWS bot, and Tide is delegated by GCP bot.

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Sep 29, 2020

@Jeffwan @jlewi @andreyvelich

I cleaned up the code and nothing blocks the PR for now. This will be shipped as part of POC work.

Can you take another look?

@andreyvelich
Copy link
Member

@PatrickXYS Since we create another util for AWS, my suggestion is: left only functions that we are using.

So you can delete everything except these functions: upload_to_s3, aws_configure_credential, load_kube_config, upload_file_to_s3, run, to_s3_uri, split_s3_uri.
Later we can add more features, if it is needed.

What do you think?

@jlewi
Copy link
Contributor

jlewi commented Oct 6, 2020

On my end
/lgtm

@Bobgy Thoughts?

@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 6, 2020
@Jeffwan
Copy link
Member

Jeffwan commented Oct 8, 2020

/lgtm

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PatrickXYS
/lgtm

@Bobgy
Copy link
Contributor

Bobgy commented Oct 10, 2020

/approve
I don't have any particular objections, I'm not very familiar with kf testing infra

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@PatrickXYS
Copy link
Member Author

@PatrickXYS
Copy link
Member Author

/test kubeflow-testing-presubmit

@k8s-ci-robot k8s-ci-robot merged commit db508d7 into kubeflow:master Oct 10, 2020
@PatrickXYS PatrickXYS deleted the aws_init_support branch October 13, 2020 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add some supports to make testing infra run on AWS
7 participants