Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StopSuggestion service in Katib API #413

Closed
andreyvelich opened this issue Feb 27, 2019 · 16 comments
Closed

Add StopSuggestion service in Katib API #413

andreyvelich opened this issue Feb 27, 2019 · 16 comments
Labels

Comments

@andreyvelich
Copy link
Member

Right now, we don't have service to stop Suggestion. We should create new API, called StopSuggestion. For example, if after complete one StudyJob session we have to clear something inside Suggestion service. In NAS RL service we have to clear dictionary after StudyJob is finished.
See commentary: #404 (comment).
What do you think? @YujiOshima @richardsliu @hougangliu @johnugeorge
/area nas

@johnugeorge
Copy link
Member

API name is confusing. StopSuggestion term gives idea of stopping algorithm service

In general, are we moving to a stateful API design? Can this be solved in a different way?

@andreyvelich
Copy link
Member Author

@johnugeorge We can name it EndSuggestion or something else. The problem is, right now, Controller doesn't run GetSuggestion method after last Trial finished. It is not possible to make some changes in suggestion service after creating last Trial.

@andreyvelich
Copy link
Member Author

@hougangliu @YujiOshima @richardsliu
What do you think about this name?

@hougangliu
Copy link
Member

is EndSuggestionLifecycle better and more unambiguous?

@andreyvelich
Copy link
Member Author

I am good with this name.
@johnugeorge What do you think?

@richardsliu
Copy link
Contributor

I would also like to know if NAS requires us to move toward more stateful APIs. Also in the new API design, we avoided the term suggestion and instead used less ambiguous terms like algorithm and assignment.

What is the broader problem that we are solving here? As I understand from reading the above, the real objective is to modify the suggestion service after the last trial - is this correct?

@hougangliu
Copy link
Member

@richardsliu correct.
nas-rl suggestion service stores LSTM model for each suggestion. when studyjob (change to non Running state) stops to ask request from the service, it should send a signal to nas-rl suggestion service so that the service can make the LSTM model unload from memory. otherwise, nas-rl suggestion service will exhaust host memory

@andreyvelich
Copy link
Member Author

@richardsliu @hougangliu We can name it like EndAlgorithmProcess or EndAlgorithmLifecycle since we are moving to Algorithm from Suggestion

@YujiOshima
Copy link
Contributor

YujiOshima commented Mar 8, 2019

I prefer EndAlgorithmLifecycle .
Or as @johnugeorge said, we can consider another way to solve this.
One way, Watching studyjobCR in suggestion service. When the status of studyjob become complete or fail, the suggestion service finalizes a corresponding process.
WDYT? @andreyvelich @hougangliu @johnugeorge @richardsliu

@johnugeorge
Copy link
Member

IMO, APIs should not overloaded and must be simple. AFAIK, it is meant for the system users. Adding stateful apis in this case is not the ideal solution and doesn't go well with k8s philosophy too. eg: what happens if suggestion service missed this api call because of some temporary issue, how to scale suggestion pods in this case etc

I feel, each suggestion algorithm should handle its requirements by itself(eg: Does it need extra metadata storage, where to store it, how to access it ) and should be independent.

If we need any Suggestion API modifications, we can discuss in #423

@andreyvelich
Copy link
Member Author

@johnugeorge I agree with you, but right now we have only two functions to run Suggestion service from Controller. There are GetSuggestion and ValidateSuggestionParameters. I don't think that 2 functions can handle all problems.

@andreyvelich
Copy link
Member Author

@richardsliu @YujiOshima @hougangliu @johnugeorge
We think about that issue a bit. Johnu told us a way how that can be solved without changing API.
What if we send requestNumber as a parameter inside SuggestionParameters, like with SuggestionCount, right now?
https://github.com/kubeflow/katib/blob/master/pkg/controller/studyjob/katib_api_util.go#L79

@johnugeorge
Copy link
Member

Algorithms can set custom key-value parameters in AlgorithmSetting based on its implementation. See this field in v1alpha2 https://github.com/kubeflow/katib/blob/master/pkg/api/operators/apis/experiment/v1alpha2/types.go#L174

@andreyvelich
Copy link
Member Author

@johnugeorge What should we do in v1alpha1 ?

@gaocegege
Copy link
Member

In the new design, we will have one suggestion for one experiment, thus we do not need the API now.

/close

@k8s-ci-robot
Copy link

@gaocegege: Closing this issue.

In response to this:

In the new design, we will have one suggestion for one experiment, thus we do not need the API now.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants