Make low-level API for using katib flexibly #72

YujiOshima · 2018-04-24T04:00:35Z

Discussde #66
This is the APIs I'm going to refactor and add.

API	input	process	output
CreateStudy	StudyConfig	Save Study conf to DB CreateStudyID	StudyID error
GetSuggestions	StudyID SuggestionAlgorithmName RequestNum	Create Trials from Suggesiton	[]TrialID error
RunTrials	StudyID []TrialID Worker	Request to run Trial to worker Set Trial status running	error
StopTrials	StudyID []TrialID IsComplete	Stop Trial worker Set Trial Status Complete	[]TrialID error
ShouldStopTrial	StudyID EarlyStopAlgorithm	Get ShuoulStop Trials	[]TrialID error
SetSuggestionParameter	StudyID SuggestionAlgorithmName AlgorithmParam	Set Parameters	error
SetEarlyStoppingParameter	StudyID EarlyStoppingAlgorithmName EarlyStoppingParam	Set Parameters	error
GetMetrics	[]TrialID	Get Metrics of Trials	[]Metrics error
SaveStudy	StudyID	Save StudyInfo to ModelDB	error
SaveModels	StudyID []TrialID []Metrics	Save Trial and Metrics Info to ModelDB	error

Typical usage is like below.

	studyId, _ := grpc.CreateStudy(studyConfig)
	grpc.SetSuggestionParameter(studyId, "random", suggestParam)
	grpc.SetEarlyStoppingParameter(studyId, "medianstopping", earlystopParam)
	grpc.SaveStudy(studyId)
	for IsStudyComleted() {
		trials, _ := grpc.GetSuggesitons(studyId, "random", 10)
		grpc.RunTrials(studyId, trials)
		for {
			metrics, workerState, _ := grpc.GetMetrics(studyId, trials)
			if AllWorkerCompleted(workerState) {
				grpc.CompleteTrial(studyId, trials, true)
				grpc.SaveModels(studyId, trials, metrics)
				break
			}
			shouldStops := grpc.ShouldStopTrial(studyId, trials)
			grpc.CompleteTrial(studyId, shouldStops, false)
			deleteShuldStopsFromTrialList(trials, shouldStops)
		}
	}

WDYT? @ddysher @gaocegege @libbyandhelen

The text was updated successfully, but these errors were encountered:

YujiOshima · 2018-04-24T05:19:38Z

/area manager

ddysher · 2018-04-25T04:44:53Z

@YujiOshima thanks for putting this together! I'm on business travel last two days, will take a look ASAP :)

gaocegege · 2018-04-25T05:11:49Z

Personally, LGTM

Thanks for your awesome work!

ddysher · 2018-04-26T06:45:32Z

Thanks @YujiOshima, I've listed some of my concerns:

For `GetSuggestions` API

Why do I have to pass SuggestionAlgorithmName and RequestNum to GetSuggestions if a study already has StudyConfig? There's also a SetSuggestionParameter API, which seems to do a similar task.
It seems GetSuggestions always run synchronously? is it necessary to provide support from asynchronous trials generation?

For `RunTrials` API

What's the Worker Parameter in RunTrails?
What about runtime configuration for running trials, e.g. how do I pass what the script to run, the resource limits for a single trial, etc? Kind of like the problem we've discussed in [manager & worker] Migrate dlk into worker interface #66

For `SaveStudy` API

This is a little confusing since from a user's perspective, Study is already saved when study is created; the SaveStudy is about saving StudyInfo to ModelDB, rather than saving it to core katib. It occurs to me we might need a separate API group for this?

YujiOshima · 2018-04-26T09:22:09Z

@ddysher Thank you for a comment!
I open PR about this. #74

For GetSuggestions API

I made SutudyConfig more simple. You can specify the number of trials at each suggestion request.
I think it can be run asynchrony but not test.

For RunTrials API

Worker means runtime e.g. kubernetes, 'TFoperator, I renamed it on my PR. The runtime config is api.WorkerConfig`

For SaveStudy API

I agree. I changed the CreateStudy include the SaveStudy.

I add a simple demo using minikube https://github.com/YujiOshima/hp-tuning/blob/7a7086d3336f284d1ea67f2b06051d2c12d3922c/docs/MinikubeDemo/MinikubeDemo.md

Please take a look!

YujiOshima · 2018-04-26T09:22:59Z

More concrete usage example is here https://github.com/YujiOshima/hp-tuning/blob/7a7086d3336f284d1ea67f2b06051d2c12d3922c/docs/MinikubeDemo/radom-suggest-demo.go

ddysher · 2018-04-27T12:21:45Z

@YujiOshima thanks! I'll take a look at the PR later.

libbyandhelen · 2018-05-02T20:25:13Z

@YujiOshima
So based on your PR and some modifications for supporting CMA-ES and BO, I drew a diagram to illustrate the main workflow.

The modifications are the followings:

A new grpc function GetTrial in api.proto to get trial by trial_id.
Each intermediate result should be correspond to one trial, so I store the trial_id in each intermediate result in SuggestionParameter, and found it easier to have a get trial function by trial_id instead of study_id.
for example, a suggestion parameter for the intermediate result would look like this:

{
  "name": "population",
  "value": "{\"trial_id\": \"A8UwJqEmK9SpzyMO\", \"x\": \"[0.22899100143490736, 0.23124807755799998]\", \"y\": \"\", \"penalty\": 0}"
}

it is the service who sends the create_trial request
In your PR, after the manager send get_suggestions request to the service, it will receive a list of trials. Then it loops through the trials and save each of them to database. Again, there is a one-to-one relationship between intermediate result and trial, so it is more natural for me to create trials and save intermediate result(set_suggestion_parameters) at the same time, since trial_id is needed when saving the intermediate result. Therefore, both of these are done in service side.
a new grpc function UpdateTrial in api.proto to update status and objective value after evaluation.
a new grpc function GetSuggestionParameterList in api.proto
I the original PR, one can only get suggestion parameters by param_id, but maybe it is more convenient to get a suggestion parameter pack containing all useful information by study_id.
So the request and reply protocols are:

message GetSuggestionParameterListRequest {
    string study_id = 1;
}

message GetSuggestionParameterListReply {
    message SuggestionParameterSet {
        string param_id = 1;
        string param_name = 2;
        repeated SuggestionParameter suggestion_parameters = 3;
    }
    repeated SuggestionParameterSet suggestion_parameter_set = 1;
}

A minor change in string join and split
In original PR:
https://github.com/YujiOshima/hp-tuning/blob/cc2ddbea3a4ca672ba45da7af93c21d76ec3859b/pkg/db/interface.go#L285
I changed ",\n" to "&\n" just in case a single parameter/tag contains ",\n"
for example, the ",\n" after "path_c"

{
  "name": "path_c",
  "value": "[[-0.39959364763308924], [0.010550492832451075]]"
}

other notes:

Here is the whole commit: 3496694
I rewrite some relevant function (simplified version) in python for test and illustration of idea.
I use a function call to substitute the steps in yellow block in the diagram, because these are independent of the algorithm itself
For the next step, how can we use the go interface in python?

YujiOshima · 2018-05-03T07:44:09Z

@libbyandhelen
Thank you! Cool!

A new grpc function GetTrial in api.proto to get trial by trial_id.

In new API, the Trial is an only parameter set and the Worker is an instance of the evaluation process of a trial.
Then an intermediate result is corresponding to one worker.
And multiple workers can be corresponding to one trial.
So how about get worker_id list from trial_id? And you can get an intermediate result by calling GetMetricsrpc withworker_id`.

message GetWorkersFromTrialRequest {
    string trial_id = 1;
}

it is the service who sends the create_trial request

SGTM.
I agree it is more natural that suggestion services call CreateTrial.

a new grpc function UpdateTrial in api.proto to update status and objective value after evaluation.

Same as the first comment, the objective value corresponds to Worker.
And you can update and get value with GetMetrics rpc.

a new grpc function GetSuggestionParameterList in api.proto

SGTM.
I'm going to add the rpc.

A minor change in string join and split

Parameters and Tags are encoded [here](https://github.com/YujiOshima/hp- tuning/blob/cc2ddbea3a4ca672ba45da7af93c21d76ec3859b/pkg/db/interface.go#L354)
I think it is not a problem.
Minimal encoding and decoding code is https://play.golang.org/p/wz-ML98fJW8 .

libbyandhelen · 2018-05-08T16:50:38Z

@YujiOshima
Thank you!
You said that multiple workers can be correspond to one trial. Then what is the relationship between these workers. Are their metrics the same? Or they are for different objectives? If so, since the current algorithms do not support muti-objectives, can I safely assume that one trial is only correspond to one worker?

YujiOshima · 2018-05-09T04:30:58Z

@libbyandhelen
In my assumption, when users want to train the same parameter with several initial-values or need the variance of the result, multi workers are created from one Trial.
So the objective and metrics are the same among workers (values of them may be different.)

libbyandhelen · 2018-05-10T23:57:33Z

@YujiOshima
OK, cool. Then maybe I can use the mean of all the metric values as the objective value.

libbyandhelen · 2018-05-11T18:43:34Z

@YujiOshima
I still have a question about get metrics
the structure of the getMetricsReply seems to be like this:

[
    {
        worker_id,
        [{name, [value1, value2, ...]}, {name, [value1, value2, ...]}]
    },
    ...
]

So the question is what is the "name" and the list of values for? Isn't a worker only has one value?

YujiOshima · 2018-05-14T13:46:11Z

The metric is not only for objective value.
For example, the objective value is accuracy, but you may want to collect loss, recall etc.
The names of metrics are defined in study config.
And Katib will collect all logs of each metrics value.
So when you want to get the latest objective value, set the name of objective value to GetMetricsRequest.metrics_names and get getMetricsReply.metrics_log_sets.metrics_logs.vlues[-1]

libbyandhelen · 2018-05-24T00:22:12Z

@YujiOshima
I am trying to rewrite the cma-es algorithm using the new API, and I get this error:
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNKNOWN, sql: expected 4 destination arguments in Scan, not 3)>

is this because of this line of code?
https://github.com/YujiOshima/hp-tuning/blob/80faafc4188557d0d1930b32abf0451fd553e0fa/pkg/db/interface.go#L848

lluunn · 2018-05-24T01:02:22Z

cc @lluunn

YujiOshima · 2018-05-24T01:44:30Z

@libbyandhelen Oh, I'm sorry for my mistake.
I will open PR to fix it.

jlewi · 2018-10-09T12:43:55Z

/area 0.4.0

Can we close this issue? Is there more work to be done?

YujiOshima · 2018-10-10T05:01:43Z

This is completed.
/close

k8s-ci-robot · 2018-10-10T05:01:44Z

@YujiOshima: Closing this issue.

In response to this:

This is completed.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the area/manager label Apr 24, 2018

YujiOshima mentioned this issue Apr 26, 2018

Refine API #74

Merged

YujiOshima mentioned this issue Apr 30, 2018

add CMA-ES algorithm #67

Closed

YujiOshima mentioned this issue May 24, 2018

fix get service-param-list bug #92

Merged

k8s-ci-robot added the area/0.4.0 label Oct 9, 2018

k8s-ci-robot closed this as completed Oct 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make low-level API for using katib flexibly #72

Make low-level API for using katib flexibly #72

YujiOshima commented Apr 24, 2018 •

edited

Loading

YujiOshima commented Apr 24, 2018

ddysher commented Apr 25, 2018

gaocegege commented Apr 25, 2018

ddysher commented Apr 26, 2018

YujiOshima commented Apr 26, 2018

YujiOshima commented Apr 26, 2018

ddysher commented Apr 27, 2018

libbyandhelen commented May 2, 2018

YujiOshima commented May 3, 2018

libbyandhelen commented May 8, 2018

YujiOshima commented May 9, 2018

libbyandhelen commented May 10, 2018

libbyandhelen commented May 11, 2018 •

edited

Loading

YujiOshima commented May 14, 2018

libbyandhelen commented May 24, 2018

lluunn commented May 24, 2018

YujiOshima commented May 24, 2018

jlewi commented Oct 9, 2018

YujiOshima commented Oct 10, 2018

k8s-ci-robot commented Oct 10, 2018

Make low-level API for using katib flexibly #72

Make low-level API for using katib flexibly #72

Comments

YujiOshima commented Apr 24, 2018 • edited Loading

YujiOshima commented Apr 24, 2018

ddysher commented Apr 25, 2018

gaocegege commented Apr 25, 2018

ddysher commented Apr 26, 2018

For GetSuggestions API

For RunTrials API

For SaveStudy API

YujiOshima commented Apr 26, 2018

YujiOshima commented Apr 26, 2018

ddysher commented Apr 27, 2018

libbyandhelen commented May 2, 2018

YujiOshima commented May 3, 2018

libbyandhelen commented May 8, 2018

YujiOshima commented May 9, 2018

libbyandhelen commented May 10, 2018

libbyandhelen commented May 11, 2018 • edited Loading

YujiOshima commented May 14, 2018

libbyandhelen commented May 24, 2018

lluunn commented May 24, 2018

YujiOshima commented May 24, 2018

jlewi commented Oct 9, 2018

YujiOshima commented Oct 10, 2018

k8s-ci-robot commented Oct 10, 2018

YujiOshima commented Apr 24, 2018 •

edited

Loading

For `GetSuggestions` API

For `RunTrials` API

For `SaveStudy` API

libbyandhelen commented May 11, 2018 •

edited

Loading