Add StopSuggestion service in Katib API #413

andreyvelich · 2019-02-27T01:16:52Z

Right now, we don't have service to stop Suggestion. We should create new API, called StopSuggestion. For example, if after complete one StudyJob session we have to clear something inside Suggestion service. In NAS RL service we have to clear dictionary after StudyJob is finished.
See commentary: #404 (comment).
What do you think? @YujiOshima @richardsliu @hougangliu @johnugeorge
/area nas

The text was updated successfully, but these errors were encountered:

johnugeorge · 2019-02-27T02:46:27Z

API name is confusing. StopSuggestion term gives idea of stopping algorithm service

In general, are we moving to a stateful API design? Can this be solved in a different way?

andreyvelich · 2019-02-28T05:21:53Z

@johnugeorge We can name it EndSuggestion or something else. The problem is, right now, Controller doesn't run GetSuggestion method after last Trial finished. It is not possible to make some changes in suggestion service after creating last Trial.

andreyvelich · 2019-03-07T19:12:46Z

@hougangliu @YujiOshima @richardsliu
What do you think about this name?

hougangliu · 2019-03-07T22:58:32Z

is EndSuggestionLifecycle better and more unambiguous?

andreyvelich · 2019-03-07T23:06:18Z

I am good with this name.
@johnugeorge What do you think?

richardsliu · 2019-03-08T01:43:27Z

I would also like to know if NAS requires us to move toward more stateful APIs. Also in the new API design, we avoided the term suggestion and instead used less ambiguous terms like algorithm and assignment.

What is the broader problem that we are solving here? As I understand from reading the above, the real objective is to modify the suggestion service after the last trial - is this correct?

hougangliu · 2019-03-08T02:02:08Z

@richardsliu correct.
nas-rl suggestion service stores LSTM model for each suggestion. when studyjob (change to non Running state) stops to ask request from the service, it should send a signal to nas-rl suggestion service so that the service can make the LSTM model unload from memory. otherwise, nas-rl suggestion service will exhaust host memory

andreyvelich · 2019-03-08T02:32:28Z

@richardsliu @hougangliu We can name it like EndAlgorithmProcess or EndAlgorithmLifecycle since we are moving to Algorithm from Suggestion

YujiOshima · 2019-03-08T02:45:06Z

I prefer EndAlgorithmLifecycle .
Or as @johnugeorge said, we can consider another way to solve this.
One way, Watching studyjobCR in suggestion service. When the status of studyjob become complete or fail, the suggestion service finalizes a corresponding process.
WDYT? @andreyvelich @hougangliu @johnugeorge @richardsliu

johnugeorge · 2019-03-08T06:23:12Z

IMO, APIs should not overloaded and must be simple. AFAIK, it is meant for the system users. Adding stateful apis in this case is not the ideal solution and doesn't go well with k8s philosophy too. eg: what happens if suggestion service missed this api call because of some temporary issue, how to scale suggestion pods in this case etc

I feel, each suggestion algorithm should handle its requirements by itself(eg: Does it need extra metadata storage, where to store it, how to access it ) and should be independent.

If we need any Suggestion API modifications, we can discuss in #423

andreyvelich · 2019-03-08T06:50:19Z

@johnugeorge I agree with you, but right now we have only two functions to run Suggestion service from Controller. There are GetSuggestion and ValidateSuggestionParameters. I don't think that 2 functions can handle all problems.

andreyvelich · 2019-03-14T22:43:22Z

@richardsliu @YujiOshima @hougangliu @johnugeorge
We think about that issue a bit. Johnu told us a way how that can be solved without changing API.
What if we send requestNumber as a parameter inside SuggestionParameters, like with SuggestionCount, right now?
https://github.com/kubeflow/katib/blob/master/pkg/controller/studyjob/katib_api_util.go#L79

johnugeorge · 2019-03-21T17:56:00Z

Algorithms can set custom key-value parameters in AlgorithmSetting based on its implementation. See this field in v1alpha2 https://github.com/kubeflow/katib/blob/master/pkg/api/operators/apis/experiment/v1alpha2/types.go#L174

andreyvelich · 2019-03-21T19:14:00Z

@johnugeorge What should we do in v1alpha1 ?

gaocegege · 2019-10-11T02:33:23Z

In the new design, we will have one suggestion for one experiment, thus we do not need the API now.

/close

k8s-ci-robot · 2019-10-11T02:33:24Z

@gaocegege: Closing this issue.

In response to this:

In the new design, we will have one suggestion for one experiment, thus we do not need the API now.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the area/nas label Feb 27, 2019

k8s-ci-robot closed this as completed Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add StopSuggestion service in Katib API #413

Add StopSuggestion service in Katib API #413

andreyvelich commented Feb 27, 2019

johnugeorge commented Feb 27, 2019

andreyvelich commented Feb 28, 2019

andreyvelich commented Mar 7, 2019

hougangliu commented Mar 7, 2019

andreyvelich commented Mar 7, 2019

richardsliu commented Mar 8, 2019

hougangliu commented Mar 8, 2019

andreyvelich commented Mar 8, 2019

YujiOshima commented Mar 8, 2019 •

edited

Loading

johnugeorge commented Mar 8, 2019

andreyvelich commented Mar 8, 2019

andreyvelich commented Mar 14, 2019

johnugeorge commented Mar 21, 2019

andreyvelich commented Mar 21, 2019

gaocegege commented Oct 11, 2019

k8s-ci-robot commented Oct 11, 2019

Add StopSuggestion service in Katib API #413

Add StopSuggestion service in Katib API #413

Comments

andreyvelich commented Feb 27, 2019

johnugeorge commented Feb 27, 2019

andreyvelich commented Feb 28, 2019

andreyvelich commented Mar 7, 2019

hougangliu commented Mar 7, 2019

andreyvelich commented Mar 7, 2019

richardsliu commented Mar 8, 2019

hougangliu commented Mar 8, 2019

andreyvelich commented Mar 8, 2019

YujiOshima commented Mar 8, 2019 • edited Loading

johnugeorge commented Mar 8, 2019

andreyvelich commented Mar 8, 2019

andreyvelich commented Mar 14, 2019

johnugeorge commented Mar 21, 2019

andreyvelich commented Mar 21, 2019

gaocegege commented Oct 11, 2019

k8s-ci-robot commented Oct 11, 2019

YujiOshima commented Mar 8, 2019 •

edited

Loading