Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wg-manifests: Introduce WG Manifests #435

Merged
merged 3 commits into from
Nov 3, 2020

Conversation

yanniszark
Copy link
Contributor

As discussed in today's community meeting and after lengthy discussions in:

please find here the PR for the Manifests WG (previously called Control Plane).
Please take a look at the PR files for a complete explanation of the scope and responsibilities.

/cc @jlewi @PatrickXYS @Jeffwan @StefanoFioravanzo @elikatsis @vkoukis @cvenets

@k8s-ci-robot
Copy link

@yanniszark: GitHub didn't allow me to request PR reviews from the following users: cvenets.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

As discussed in today's community meeting and after lengthy discussions in:

please find here the PR for the Manifests WG (previously called Control Plane).
Please take a look at the PR files for a complete explanation of the scope and responsibilities.

/cc @jlewi @PatrickXYS @Jeffwan @StefanoFioravanzo @elikatsis @vkoukis @cvenets

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubeflow-bot
Copy link

This change is Reviewable

- The maintained manifests will follow certain principles:
- The flow of work / separation of responsibilities will be the following:
1. Application owners publish manifests in their repos.
1. WG copies and tracks upstream manifests in the manifests repo. They
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume an automated way to do this, at least in master branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PatrickXYS you have selected 3 lines, so I'm not sure to what you're referring 😅
Can you please clarify, perhaps by quoting the part the question is referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, should be this line

1. WG copies and tracks upstream manifests in the manifests repo. They
form a base `kustomization`.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PatrickXYS yes, this should be automated.


##### With Application Owners

- Communicate with application owners to agree upon the version they want to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part, should we include PM team as well, since we rely on them about cross-project conversations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PatrickXYS I agree, but there is no PM WG right now and I don't see something similar in other WGs.
We can always add this in the near future!

- Central Dashboard
- Profile Controller
- PodDefaults Controller
- Maintain documentation and instructions for installing a set or all of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though we should not rely on vendor distribution, I think we still need to deploy applications on a "generic" kubernetes system, what would this be?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will all have to decide this together as a WG, when we pin down how to test.

@thesuperzapper
Copy link
Member

@yanniszark Just to be explicit, assuming the discussion in #434 decides that we WILL have a "generic" distribution, will the Manifests WG be responsible for it?

@@ -19,7 +19,7 @@ The WG covers researching, developing and operating various targets of ML automa
#### Cross-cutting and Externally Facing Processes

- Coordinating with Training WG to make sure that all distributed training jobs can be used in AutoML experiments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinating with Training WG to make sure that all distributed training jobs can be used in AutoML experiments.

Why is this in scope for this WG?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think autoML covers this part. https://github.com/kubeflow/community/blame/master/wg-automl/charter.md#L21
It can be removed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinating with Training WG to make sure that all distributed training jobs can be used in AutoML experiments.

This line is not related to this PR, unless I am missing something.

@@ -19,7 +19,7 @@ The WG covers researching, developing and operating various targets of ML automa
#### Cross-cutting and Externally Facing Processes

- Coordinating with Training WG to make sure that all distributed training jobs can be used in AutoML experiments.
- Coordinating with Control Plane WG to ensure that AutoML manifests are properly deployed with Kubeflow.
- Coordinating with Manifests WG to ensure that AutoML manifests are properly deployed with Kubeflow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinating with Manifests WG to ensure that AutoML manifests are properly deployed with Kubeflow.

Why would this be necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be necessary?

@jlewi those are not new scopes, but pre-existing ones.
I simply amended the name since WG Control Plane doesn't exist anymore.


- Provide a catalog (centralized repository) of Kubeflow application manifests.
- Provide a catalog of third-party apps for common services.
- Provide documentation to help users install a set, or all of the included
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide documentation to help users install a set, or all of the included
apps.

Please remove this from the scope of the charter.

The primary goal with forming WGs right now is to reduce ambiguity by clarifying ownership of existing assets.

This is not an existing asset.

Furthermore, my expectation is that application owners will be responsible for instructions to install their application standalone.

Distribution owners will be responsible for providing instructions to install their distributions.

Can we see how that goes before expanding the scope of this WG? If we find that leaves a gap and this WG ends up addressing the gap then we can expand the scope later on.

in `kustomize` overlays and they don't touch the upstream files.
1. Periodically, the upstream manifests are updated by copying from a
later commit.
- Contrasted with the current state, in the aforementioned workflow:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this out of the charter and into the PR description.

already do, as it's needed for testing and developing the app.
- Manifests depend on the frequently updated manifests of the app repo,
instead of the out-of-date, kfctl-specific manifests in the current
`kubeflow/manifests` repo. [A recent proposal](https://www.google.com/url?q=http://bit.ly/kf_kustomize_v3&sa=D&ust=1603724692328000&usg=AOvVaw2qgtPzKUz5zqIjpn3Yoas7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this into the PR description it doesn't belong in the charter.

wg-manifests/charter.md Outdated Show resolved Hide resolved
- KNative
- Dex
- Cert-Manager
1. Manifests for common, currently unmaintained, apps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you distinguishing between "unmaintained" and "maintained" apps? If your just copying the manifests from the upstream repo to this repo then isn't it the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will reword


#### Code, Binaries and Services

- Maintain a set of manifests that will allow users to install Kubeflow apps and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you actually maintaining the manifests? I thought it would be the application owners that would be maintaining the manifests?

My expectation is that you would be maintaining tooling to automate the copying of the manifests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will reword

1. Application owners publish manifests in their repos.
1. WG copies and tracks upstream manifests in the manifests repo. They
form a base `kustomization`.
1. All kubeflow-specific changes (e.g., change the namespace) are done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. What are "kubeflow-specific" changes? If an application should be installed in the "kubeflow" namespace shouldn't that be part of the application manifest and maintained in the upstream repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlewi what I mean is that some applications are installed in a slightly different ways sometimes.
For example, KFServing in manifests used the kubeflow namespace, while the standalone in the upstream repo used the kfserving-system namespace.

Now, this particular example no longer applies, as KFServing now also maintains a kubeflow overlay which deploys in the kubeflow namespace, but the fact remains.

So I will use another example. The user identity header expected by an application is not the same across the board.

So for Pipelines and JWA to work together, we have to tweak one of the two slightly.
This change can happen in an overlay, maintained by wg-manifests, so that the user can install pipelines, jwa, or the two of them together.

Ideally, everything will be kind of uniform on the app side, so we won't have to maintain these, but unfortunately this is not always the case. We want to be able to do these minor fixes if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yanniszark Per comment I would expect this to work as follows.

IIUC the header will be different depending on the distribution. With Google IAP the header gets set to "x-goog-authenticated-user-email" with Dex or Cognito I presume its a different a header.

Its up to each application to make the header configurable; e.g. provide an environment variable that allows the header to be set.

It would then be up to the distribution to set the environment variable correctly for all applications.

So lets suppose you wanted to define an overlay to make the environment variable configurable; Per comment I think we should pursue one of the following

  1. We push that overlay upstream into the app's repo upstream of manifests
  2. We push the overlay downstream into the distribution's repo

How does that compare to what you are thinking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that ideally we would want all these overlays to be patched on the upstream apps, but we need to have a way to be able to push them to the manifests repo, even in another directory, so we can actually test. And I agree that we should then try to then push upstream or downstream to have the ideal scenario, where this directory is empty.

@jlewi
Copy link
Contributor

jlewi commented Oct 28, 2020

@yanniszark Just to be explicit, assuming the discussion in #434 decides that we WILL have a "generic" distribution, will the Manifests WG be responsible for it?

No. This is out of scope. We have been debating WG ownership for over two months in #402. Distributions will not be in scope for WGs

@jlewi
Copy link
Contributor

jlewi commented Oct 28, 2020

@yanniszark Thanks for writing this up.

High level comment: I would like to err on the side of defining work groups as narrowly as possible. We can expand scope later .on.

I'd like the scope of this WG to be narrowed to be in line with what we had in #402. I'd like this WG to primarily be about splitting off the responsibility for the catalog into a separate WG from the one owning kfctl.

Please remove the bit about installation instructions; see my more detailed comment. Nothing precludes you from contributing to documentation. We can always expand the scope of the WG later on to include it when the following conditions are met

  1. We identify an unmet need
  2. We identify a solution that requires ownership


This WG is NOT going to:
- Maintain deployment-specific tools like `kfctl`.
- Maintain platform-specific manifests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do distribution overlays get defined? See
#402 (comment)

I think what we suggested that distribution overlays should be defined upstream in the apps repository or downstream in a distribution repo.

If its defined in the app repo would it be preserved when it gets copied over to the manifests repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, for example, if KFP had a Google specific overlay in their manifests, if it would be present in the manifests repo because we copy their manifests? I think the answer to that is yes, since we'll most likely copy the whole manifests subtree. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. More generally the question is if someone wants to create a customization to support X; e.g. (external SQL DB, Selecting KFP engine etc...) where should this go. Per #402 I think the options are

  1. Push them upstream - they should coordinate with app owners to include it in their manifests
  2. Push them downstream - They should work with distribution owners to figure out template free ways of post-processing the manifests

@jlewi jlewi mentioned this pull request Oct 28, 2020
@jlewi
Copy link
Contributor

jlewi commented Oct 28, 2020

To provide a bit more context for my perspective. The critical problem that I want to address is ensuring that app owners take responsibility for their manifests. My expectation from day 0 was that application owners were taking responsibility and curating/updating their manifests as needed.

There are various discussions that indicate to me that this is not widely understood yet. For example,

This is why I was pushing for the scope to be so narrowly defined in #402. Language like the following

Maintain a set of manifests that will allow users to install Kubeflow apps and

Creates ambiguity about division of responsibilities. Is it the app owners or the manifests WG responsible for maintaining the manifest?

My hope was to address this by having very narrow and well defined scope and responsibilities

  • App owners maintain manifests in their repos
  • manifests-wg maintains automation to copy them to kubeflow/manifests

The division of responsibilities is then very clear. In particular, going back to kubeflow/manifests#1553 this would be handled as follows

  • Look in the upstream repo; if the image is outdated in the upstream repo the app-owner is responsible
  • If the manifest in kubeflow/manifests is not in sync with the upstream then its a bug in the wg's tooling

I would like to see this functioning as intended before expanding scope beyond this very narrow definition. In particular, I'd like to see

  1. Every WG maintains its manifests in its upstream repo
  2. This WG produces automation to keep the manifests in kubeflow/manifests in sync with the upstream repos

@yanniszark yanniszark force-pushed the feature-wg-manifests branch from f1c7591 to 1bac199 Compare October 30, 2020 01:07
@yanniszark
Copy link
Contributor Author

@jlewi I updated the PR according to your suggestions. Please take another look whenever possible.

@thesuperzapper
Copy link
Member

I made a comment #434 (comment), which I think is important to discuss.

Under the definition I propose for "reference" distribution, I think this Working Group is the clear owner.

I define "reference" as the set of YAML which comprises Kubeflow itself (not how you deploy that YAML, or what Applications of Kubeflow you pick).

This only leaves a few things:

  1. This WG should have authority to prescribe what format Applications use for their YAML
  2. This WG should have authority to prescribe limitations on dependancies for Applications [For example, not depending on Istio 1.5 features]
  3. This WG should be tasked with providing a single "reference" of cluster-level resources [For example, istio, knative, cert-manager, dex]
    • NOTE_1: this means that Kubeflow Applications will NOT include these cluster-level resources in thier YAML
    • NOTE_2: vendors can still swap in their own cluster-level resources (e.g. use a proprietary auth system)

@jlewi
Copy link
Contributor

jlewi commented Oct 30, 2020

@thesuperzapper

Under the definition I propose for "reference" distribution, I think this Working Group is the clear owner.

This is explicitly not in scope for this WG. And will be explicitly out of scope.

@thesuperzapper
Copy link
Member

@jlewi can you be precise about why those 3 points should not be responsibilities of this WG.

Or do you believe no one should be responsible for those things?

@PatrickXYS
Copy link
Member

I think 2 months of discussion about WG responsibilities has been a burden for all the community folks.

Before we come up with a PERFECT plan on how do we organize WG deployment and manifest, whether WG responsibilities overlap with another WG or not,

Let's start from the smaller scope and start development work just now; the responsibilities can be expanded if WG has proved they have enough capacity and resources in making WG success.

@jlewi
Copy link
Contributor

jlewi commented Oct 30, 2020

@yanniszark and @cvenets thanks for agreeing to the changes.

LGTM. @paveldournov @theadactyl @Bobgy PTAL.

@yanniszark and @cvenets are you ok letting some of the existing directories in kubeflow/manifests remain that fall outside the scope? e.g. in particular
https://github.com/kubeflow/manifests/tree/master/aws
https://github.com/kubeflow/manifests/tree/master/gcp

I wouldn't expect the WG to maintain them (they all have OWNERs) files; But I don't want to unceremoniously kick them out. We can figure out an appropriate transition plan over time.

@jlewi
Copy link
Contributor

jlewi commented Oct 30, 2020

Before we come up with a PERFECT plan on how do we organize WG deployment and manifest, whether WG responsibilities overlap with another WG or not,

+1

Scope can always be expanded later on.

I'm intentionally pushing WGs to have very narrow and well defined scope because I want the WGs to be successful. I want to set very clear expectations and give the WGs easy wins.

Per my comment with @yanniszark

My expectation is that this repo is just a mirror of whatever is upstream e.g.

/pipelines = https://github.com/kubeflow/pipelines/tree/master/manifests
/kfserving = https://github.com/kubeflow/kfserving/tree/master/config
/katib = https://github.com/kubeflow/katib

It looks like for the other applications they may not have upstream manifests yet.

So my expectation is that this WG is

Building automation to copy over the manifests
Coordinating with app WGs to create their upstream manifests

I hope this is unambiguous, uncontroversial and an easy win. In 6-12 months when we want to evaluate the WG and see if its meeting expectations it should be very easy. We just open two browsers and compare the upstream and manifests repo.

This will demonstrate that the application WGs are up and running and coordinating effectively with the manifests WG. Once that's the case we will be in a much better position to tackle harder, more ambiguous problems.

See my earlier comment , right now it looks to me like there is still confusion about the division of responsibilities. We need to fix this before tackling more complicated problems.

@yanniszark
Copy link
Contributor Author

@yanniszark and @cvenets are you ok letting some of the existing directories in kubeflow/manifests remain that fall outside the scope? e.g. in particular
https://github.com/kubeflow/manifests/tree/master/aws
https://github.com/kubeflow/manifests/tree/master/gcp

I wouldn't expect the WG to maintain them (they all have OWNERs) files; But I don't want to unceremoniously kick them out. We can figure out an appropriate transition plan over time.

@jlewi absolutely, supporting and transitioning current users is important for the project.
I was thinking that we should perhaps place everything that should be phased out of kubeflow/manifests under a certain folder.
This structure will ensure the repo remains organized and clean, while allowing distribution owners to transition their users to their new distribution repos. The eventual goal is for that folder to become empty and be deleted.

What do you think?

@jlewi
Copy link
Contributor

jlewi commented Nov 2, 2020

Thanks

I was thinking that we should perhaps place everything that should be phased out of kubeflow/manifests

Moving folders will likely break references. Lets discuss on a separate issue any migration/refactoring plans.

@yanniszark
Copy link
Contributor Author

@jlewi sounds good, let's discuss after merging this.
I think we have reached agreement on all points and the PR is reflecting that.
Is there anything missing to merge this PR?

@theadactyl
Copy link
Contributor

Thanks for ensuring that this effort has WG leadership and for taking the time to discuss scope.

/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: theadactyl, yanniszark

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@PatrickXYS
Copy link
Member

/cc @yanniszark

Can you rebase to pass the conflict issue?

@k8s-ci-robot
Copy link

@PatrickXYS: GitHub didn't allow me to request PR reviews from the following users: yanniszark.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @yanniszark

Can you rebase to pass the conflict issue?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
Run `make WHAT=wg-manifests` to generate files for the control plane
working group.

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
The WG responsible for the manifests is called is the WG Manifests. Fix
references in chapters of other WGs and SIGs and re-gerenate files.

Signed-off-by: Yannis Zarkadas <yanniszark@arrikto.com>
@yanniszark yanniszark force-pushed the feature-wg-manifests branch from 1bac199 to 9d7c987 Compare November 3, 2020 22:23
@k8s-ci-robot k8s-ci-robot removed the lgtm label Nov 3, 2020
@yanniszark
Copy link
Contributor Author

@theadactyl @jlewi @PatrickXYS rebased ontop of latest changes!

@PatrickXYS
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 0744fa0 into kubeflow:master Nov 3, 2020
@PatrickXYS
Copy link
Member

Hi @yanniszark , do we have wg-manifests slack channel established?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants