-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(wg): Add WG Training #356
Conversation
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
/lgtm |
Seems we have members from different time zones, meeting time could be early morning or late afternoon in US time. I am in UTC-7. |
Should we organize two meetings for different time zones? |
I am in EST. Another option is to have meetings alternating between two friendly time zones (similar to Kubeflow community meeting). We can fill out a survey to see which way would work better for us. |
SGTM. |
Personally, prefer bi-weekly for the WG. I think there are only a few owners/contributors although we have about 10 projects in WG Training. Weekly meeting will have huge overhead for members. WDYT |
Yes, I meant either bi-weekly or monthly is fine to me. I agree that weekly meeting would be too much overhead. |
I agree with @terrytangyuan comment: #356 (comment). |
@gaocegege and @andreyvelich The scope of the AutoML and Training WGs need to be better defined and once they are I'd like to come back to the question of whether they should be separate WGs or sub projects within the groups. Here's the scope for the training WG
Here's the scope for the AutoML WG
These scopes are pretty self referential. It doesn't actually tell me whats included in training or AuotML. They also seem like they are very overlapping. AutoML has to launch training jobs right? But that would be under the purvoy of training. If I define the scope of training as "learning models from data" then that would be encompassing of both training operators and AutoML. |
I think WG training is focused on the training part of https://cloud.google.com/ai-platform , and WG AutoML is focused on https://cloud.google.com/automl . Training is to learn parameters from data, AutoML is to learn models from data. WDYT @andreyvelich |
I agree with @gaocegege. Maybe, we should modify scope of WG Training and AutoML to be more precise. I think main purpose of Training WG is to run ML training jobs, but in AutoML training is just part of workflow. |
Signed-off-by: Ce Gao <gaoce@caicloud.io>
@jlewi Can we come to an agreement that we will have WG Training and WG AutoML? @terrytangyuan Your comments are addressed. Please take a look, thanks for your time 😄 |
@gaocegege Thanks. Looks good to me. /assign @jlewi |
@gaocegege @andreyvelich still has #358 open; are we going with one WG or 2? If one WG which PR is open. A couple of issues
|
@jlewi We still don't come to the final agreement. @animeshsingh @johnugeorge @karlschriek @vpavlin @jbottum can you give your thoughts about WG Training and WG AutoML, please? |
This is a strong argument for making them distinct projects but why distinct WGs? The charters for AutoML and the training WGs are pretty narrowly defined in my oppinion. The charters as written look more like the project charters then a WG charter. From a technical perspective, it seems like our AutoML efforts and training efforts are pretty coupled. The scope of training defined in this doc is:
Launching training jobs is pretty fundamental to AutoML/hyperparameter tuning. AutoML/hyperparameter tuning is basically an orchestrator of training jobs. This means our training APIs ought to be defined with the goal of supporting AutoML and hyperparameter tuning. e.g per the discussion in kubeflow/katib#1273 (comment) the fact that we don't have a KF resource model that extends the K8s resource models with features like inputs and outputs makes it much harder for AutoML/HP Tuning to orchestrate training jobs. This is strongly sugests to me that these projects should rollup to a shared set of tech leads/managers who can drive the consensus/compromises needed to ensure these projects play well together. Another strong reason for a shared WG is to amortize the cost of the supporting infrastructure. See for example: kubeflow/testing#737. Suppose test and release infrastructure becomes the responsibility of each WG; i.e. each WG is responsible for maintaining its own K8s clusters if needed, its own docker registries, etc.... Do the training and auto ML WGs have the critical mass to support this or would you be better off collaborating? |
I am not sure that AutoML is just an orchestrator of training jobs. Also, after implementing kubeflow/katib#1273 (comment), we will be not depend on Kubeflow training operators APIs. Kubeflow training operators just need to follow
I believe we can't setup same test infrastructure for training jobs and AutoML. Complete integration tests for AutoML can be very different than for Training operators, because it includes various components. For docker registry, currently we maintain our own registry: https://hub.docker.com/u/kubeflowkatib for some Katib's examples images. From my perspective, it is hard to define Scope for Training and AutoML in one WG because they have various goals. What do you think @gaocegege @terrytangyuan @johnugeorge @Jeffwan ? |
I also think so. AutoML is a separate topic. |
The leads for the AutoML WG look like a subset of the leads of the training WG. The training WG looks like it only contains two leads who aren't also in the AutoML WG. If there's very little overlap between these WGs why is there so much overlap in the leads? If the two WGs don't have enough people to have independent leads, is that an indication that we haven't reached critical mass to support two WGs? If AutoML and Training are both meaty and independent topics warranting their own WGs, are the leads going to be oversubscribed trying to lead two WGs? |
I think if we just have one WG for both training and AutoML, then there is no chance to get more contributors/maintainers for AutoML projects. If we have a separate WG for AutoML, we can engage more ppl in this area to get involved into our community, I think. |
Agree with @gaocegege. |
The fact that all of the AutoML leads are currently also leads in the training WG suggests to me there is some underlying connection between these two. What is the connection? M |
We have the same leads, because we don't have enough contributors currently, but the scopes of these projects are different. I thought, one of the main purpose of creating WGs is to grow Kubeflow community and we can involve more leads to the WGs. |
@jlewi Yes. currently, there is overlap between leads. But, can we say that one project is dependent on the other? There have been lots of contributors in training area since beginning who were not Katib users/developers. And vice versa is true. Keeping separate WG for AutoML will help in focussing on newer features happening in that area. It is an active research area and an evolving one. So I expect more activity here in the future. Major focus on training area in recent times was in building common code base and APIs, supporting newer operators. While for Katib, focus is more on new algorithms and features and API is still getting into beta. I feel, these two projects have different directions and discussion points are also different if we look at them carefully. So, technically, having separate WGs make more sense than merging into one. However given the current state of having common leads across projects and fewer active contributors, I am also concerned if it is an over commitment to have separate WGs in terms of the efforts to be put in (testing,release infra support etc as Jeremy pointed out). |
Given the current roadmap of training operators and Katib, it makes sense to me from technical standpoint that these can potentially/eventually be separate WGs as the number of contributors grow. However, I have similar concerns that @jlewi and @johnugeorge mentioned above. There maybe a lot of infra/testing/releasing efforts and communication overhead involved if there are two separate WGs. An alternative strategy to consider is to start with one WG so we can all make our best commitment to help it grow and if we realize that we've attracted enough contributors that can potentially become leads, we can start this discussion again and consider gradually rolling out a new WG. |
Thanks @johnugeorge and @andreyvelich I think at this point I'm willing to go along with 1 or 2 WGs for training and automl wgs based on whatever the consensus is. Since this is one of the first WGs to be formally approved. I think it would be good to get an LGTM from some other potential WG leads to ensure charter etc... is well scoped. I might suggest @ellistarn and @animeshsingh as they have been leading the KFServing WG and it would be good to ensure our processes are converging. |
|
||
### Out of scope | ||
|
||
- APIs used for running inference/serving tasks (this falls under the purview of WG Serving). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about hyper parameter tuning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @andreyvelich @jlewi
We are still considering if we should have WG AutoML. Thus did not add it here.
day: Wednesday | ||
time: "03:00" | ||
tz: PT (Pacific Time) | ||
frequency: monthly - first Wednesday every month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend a more frequent WG meeting while this is established.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another meeting that's US friendly (see below). That would probably be sufficient. I would expect most of the communications done asynchronously on GitHub or Slack as members are from different time zones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaocegege I would also like to join the training working group, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChanYiLin Thanks for your contribution. Added
Took a look. Reminds me that @animeshsingh and I need to finally publish our charter. It was in Google Doc form ~1 year ago. |
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Based on the comments from everyone, we can go ahead with 2 WGs - training and AutoML. LGTM from my side /lgtm |
Yes, I am also fine with two WGs. /lgtm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @johnugeorge @terrytangyuan @andreyvelich @jlewi
Signed-off-by: Ce Gao gaoce@caicloud.io