-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extracting metric value in multiple ways #1140
extracting metric value in multiple ways #1140
Conversation
@andreyvelich @gaocegege @johnugeorge Could you help to review this? |
@@ -36,10 +36,21 @@ type TrialSpec struct { | |||
// Whether to retain the trial run object after completed. | |||
RetainRun bool `json:"retainRun,omitempty"` | |||
|
|||
// Describes how objective value will be extracted from collected metrics | |||
ObjectiveExtract ObjectiveExtractType `json:"objectiveExtract,omitempty"` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we have an issue for this #987. Maybe instead of set up another env variable, in each run we can have min
, max
and latest
in Objective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @gaocegege
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich The idea extracting min, max and latest objective in each run looks good to me. But I think we also need an additional field like ObjectiveExtractType
to describe which kind of objective chosen to be compared with goal.
Of course env variable DefaultObjectiveExtract
is not necessary since the default choice can be figured out by ObjectiveType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, maybe it should be part of Experiment?
What do you think @johnugeorge ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich @johnugeorge @gaocegege
After a detailed review, I found the modification to common_types.Observation
will cause enormous changes including corresponding interface of all existing suggestion services.
Maybe we can keep suggestionapi.Observation
(pb spec) unchanged ? Only change common_types.Observation
to the style like
observation:
metrics:
- name: loss
min: 0.0001
max: 0.1234
latest: 0.1111
/retest |
/retest |
@johnugeorge @andreyvelich @gaocegege I think current PR is also ready. |
@@ -51,6 +51,8 @@ type ObjectiveSpec struct { | |||
// This can be empty if we only care about the objective metric. | |||
// Note: If we adopt a push instead of pull mechanism, this can be omitted completely. | |||
AdditionalMetricNames []string `json:"additionalMetricNames,omitempty"` | |||
// This field is allowed to missing, experiment defaulter (webhook) will fill it. | |||
MetricStrategies map[string]MetricStrategy `json:"metricStrategies,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change, I can see that you send Metrics to Suggestion according to MetricStrategies. Is it right way?
Maybe some Suggestions need min, max and latest Observation? Do we need to be consistent between manager: https://github.com/kubeflow/katib/blob/master/pkg/apis/manager/v1alpha3/api.proto#L177 and Trial API?
/cc @johnugeorge @gaocegege
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we modify definition of observation in api.proto
, we need to apply corresponding modifications on all suggestion services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand it. You think we don't need it in suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich Yes. For suggestion service, as far as I know, the final reward of training program is the only necessary value. It doesn't care about historical records of metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a huge change to API. Can we place this change in the next release? Maybe v1beta1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @gaocegege. Let's keep it for next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -51,6 +51,8 @@ type ObjectiveSpec struct { | |||
// This can be empty if we only care about the objective metric. | |||
// Note: If we adopt a push instead of pull mechanism, this can be omitted completely. | |||
AdditionalMetricNames []string `json:"additionalMetricNames,omitempty"` | |||
// This field is allowed to missing, experiment defaulter (webhook) will fill it. | |||
MetricStrategies map[string]MetricStrategy `json:"metricStrategies,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a huge change to API. Can we place this change in the next release? Maybe v1beta1.
I'm okay with that. |
f8ac41e
to
62e09cf
Compare
@gaocegege @andreyvelich I've migrated codes to type Metric struct {
Name string `json:"name,omitempty"`
Min float64 `json:"min,omitempty"`
Max float64 `json:"max,omitempty"`
Latest string `json:"latest,omitempty"`
}
|
Thanks! I love the feature. /lgtm |
} | ||
// fetch additional metrics if exists | ||
metricLogs := reply.ObservationLog.MetricLogs | ||
for _, metricName := range instance.Spec.Objective.AdditionalMetricNames { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that correct to add AdditionalMetricNames in GetTrialObservationLog
function?
I believe this function is needed to add Observation
to Trial instance, right?
Currently, Trial Observation
is related only to ObjectiveMetricName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich Yes, I am trying to append addtional metrics into Trial.Status.Observation
. So, we can get full metrics of each trial via kubernetes API. Are you considering this modification may confuse our users/developers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I am just worried if the Suggestions will work correct.
I checked here: https://github.com/kubeflow/katib/blob/master/pkg/suggestion/v1beta1/internal/trial.py#L33, we add only objective metric to target metric. So, I think it should work fine.
@sperlingxx Can you submit one example with |
return nil, fmt.Errorf("failed to parse timestamps %s: %e", metricLog.TimeStamp, err) | ||
} | ||
timestamp, _ := timestamps[metricLog.Metric.Name] | ||
if timestamp == nil || !timestamp.After(currentTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have more than one metric with the same timestamp?
Maybe we can extract Latest metric using latest element from the metricLogs
for each metric name, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. And I think, with current implementation, katib will use latest element if there exists multiple metricLogs with same timestamp.
The condition !timestamp.After(currentTime)
ensure latest metric value will update as long as current timestamp is not earlier than historically latest timestamp. So, when they are equal, the updation will still be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change it to !currentTime.After(timestamp)
?
Since timestamp
<= currentTime
, because currentTime
is the new value from metricLog
and timestamp
historical value that we have recorded ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sperlingxx Have you tried to test it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andreyvelich , I missed this comment sereval days ago.
I think !timestamp.After(currentTime)
represents not ({timestamp} > {currentTime})
, which equals to {timestamp} <= {currentTime}
.
And there is a test case on extracting latest records if their values are same. Here is the link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sperlingxx Got it. Thank you
Do you mean providing a full executable example like those in examples folder? |
Yes, just a simple one yaml example, you can use random algorithm. |
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sperlingxx Thank you for doing this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/cc @johnugeorge
Sorry for being late. @sperlingxx can you rebase? |
dee9fec
to
108b06e
Compare
It's Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sperlingxx!
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
* migrate this PR to v1beta1 * fix * fix * fix * add example for metric strategies * modify MetricStrategy specification * fix
Currently, katib trial controller simply takes the best metric value among obversations. This approach will get the wrong objective value if training program doesn't preserve model (checkpoint) with best performance and regard it as final result. Actually, a lot of training programs simply save the model after final step.
So, it is necessary to support an another objective value extracting strategy: extracting the latest recorded value, which is the purpose of this PR.
After discussion, I found that the approach raised in #987 is better, in which we extract all min, max and latest metric value from observation logs. When converting trial observations into
api_pb.Observation
for feeding suggestion service, value will be chosen among min, max and latest metric according toMetricStrategies
, a new field ofcommonv1alpha3.ObjectiveSpec
.