You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
The experiment controller is not showing any events when fails to reconcile all trials.
For example, consider the situation the trial parameter reference is misconfigured as below. Assume that parameter is given as num-layers, and if we do not correctly set its reference as num-layer (typo) in trialParameters , all trials fail to be created.
We can check the reason for the failure in the controller log. However, users not authorized to access the controller can not find the reason that why their trials are not created since no events are emitted by the experiment controller.
$ kubectl describe experiment random-experiment -n user
...
Status:
Completion Time: <nil>
Conditions:
Message: Experiment is created
Reason: ExperimentCreated
Status: True
Type: Created
Current Optimal Trial:
Observation:
Events: <none>
What did you expect to happen:
The experiment controller emits events when fails to reconcile all trials.
Anything else you would like to add:
Relevant logs in Katib controller
Fail to get RunSpec from experiment","Experiment":"user/random-experiment","error":"Unable to find parameter: num-layer in parameter assignment map[lr:0.026271422193467404 num-layers:5 optimizer:sgd
/kind bug
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
The experiment controller is not showing any events when fails to reconcile all trials.
For example, consider the situation the trial parameter reference is misconfigured as below. Assume that parameter is given as
num-layers
, and if we do not correctly set its reference asnum-layer
(typo) intrialParameters
, all trials fail to be created.We can check the reason for the failure in the controller log. However, users not authorized to access the controller can not find the reason that why their trials are not created since no events are emitted by the experiment controller.
What did you expect to happen:
The experiment controller emits events when fails to reconcile all trials.
Anything else you would like to add:
Relevant logs in Katib controller
Environment:
kfctl version
): v1.3kubectl version
): v1.18.10/etc/os-release
): CentOS 7.9If it's okay, I'd like to contribute to solving the issue
The text was updated successfully, but these errors were encountered: