-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A more sustainable approach to owning and maintaining test/release infrastructure #737
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
+1. Bumping this discussion @jlewi |
I have a suggestion on this. I think there's 3 levels of testing in Kubeflow.
The Native tests and Application tests probably need to be nimble and fast. I was thinking we could use GitHub Actions for this. This would kind of free-up testing resources and unblock application owners from delving deep into the Prow/k8s test infra space (because that is a lot of cognitive load). The E2E tests can be owned by the platform owners with each of them bringing their own environments and staffing it themselves. So currently, we have:
@Jeffwan probably has the right idea with #748. Istio has a similar setup with IBM having separate testing environment and so on... This might solve issues around permissions and also unblock application owners to think about testing their applications independently. What do people think? |
@swiftdiaries With GitHub actions how are they administered? Would individual projects be able to administer GitHub actions for themselves or would there be a maintenance cost for GitHub org admins? |
They can manage it themselves at the repo level and an OWNERS file within the repo |
@swiftdiaries I agree that Github Actions are a great asset and much easier to figure out and get started than the current Prow setup. It would be a great option to enable application developers to quickly develop testing pipelines for their code. |
I am not sure if all applications owners like to setup testing pipelines by themselves. I assume some engineers them just want to write test cases and roll out to a cluster which can quickly pick tests up. We probably need to cover this cases as well. |
+1 on the GitHub actions for everything but the end-to-end testing/CICD. |
One downside of GItHub actions is they are pretty closely tied to GitHub. One of the motivations for using Tekton and Kubernetes native test infra was to make it easy for people to replicate and run the tests on their own infra. That said I think the decision should be up to the individual WGs what they want to proscribe and maintain for their respective projects. From that perspective the question should be is can the WG leads scalably administer GitHub actions on behalf of their projects? For example, can they onboard new projects/repos without creating toil for the Kubeflow GitHub org admins? Another issue would be billing and quota. How do we ensure fair scheduling between WGs? If a WG needs to exceed the free tier are the WGs leads in a position to assume those costs? |
GitHub has released the runner code, so it should be possible for users to run these pipelines on their own.
The free tier is unlimited for public repos so I don't think that's an issue. @jlewi I think the main problem of using our own tekton+kubernetes infra is this:
Instead of having to find these individuals, train them in Kubeflow's complicated infra and make sure they are on-call for issues (e.g., quotas filling up), we can use Github Actions, which being a managed service, circumvents this restriction. Plus, we can always use self-hosted runners, which circumvent usage limits, if we bump into scaling issues.
Totally agree! |
Related issue buildcop: #658 |
@kubeflow/automl-leads @kubeflow/kfserving-owners @kubeflow/training-leads thoughts? |
cc @kubeflow/wg-automl-leads @kubeflow/wg-training-leads
Same idea with @Jeffwan |
Related issue to move Kubeflow to the Google instance off prow and stop using Kubernetes. |
I add a doc for community member to review. This provides an alternative option to run e2e tests on AWS |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions. |
This issue has been closed due to inactivity. |
We need a more sustainable and scalable approach to owning and maintaining engprod(test & release) infrastructure for kubeflow.
We currently lack a sufficient number of individuals (3-5?) with the willingness and ability (e.g. time) to meet the increasing demands of Kubeflow as it grows.
I think there are a couple of problems we need to address
I don't think we can address this by. simply increasing the pool of "20%" contributors to engprod.
Its hard to be a good leader in your 20% time
For security reasons, we can't grant an increasing number of individuals the elevated permissions needed to maintain the test/release infrastructure
Engprod operational issues are often P0 because they block everyone from getting work done; we can't expect 20% contributors to drop everything in order to respond to P0s.
I can think of two approaches to this
@cliveseldon @yuzisun @ellistarn @neuromage @paveldournov @elikatsis @vpavlin @yanniszark @Jeffwan @krishnadurai @terrytangyuan @gaocegege @andreyvelich @johnugeorge @aronchick @StefanoFioravanzo @elviraux @kimwnasptd @krazyhaas @jinchihe @animeshsingh
The text was updated successfully, but these errors were encountered: