-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Metrics not available" problem with basic v1alpha3 deployment #1082
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
/help |
@kunalyogenshah: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@kunalyogenshah Can you describe one of your training pods, please? |
Hey @andreyvelich . Sorry for the confusion, this one was resolved on our other thread #981. This one can be closed. Thank you for all the help! |
@kunalyogenshah Sure, I will close this. /close |
@andreyvelich: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What steps did you take and what happened:
I have followed the steps in the README to create a Katib deployment using the kustomization in the manifests repo. (I do not have Kubeflow and/or Pytorch and TFJob manifests applied, only the Katib ones). The deploy went through successfully. But once I create the
random-experiment
example, it fails to finish the Trials with the following errorThe logs from the job pod for this trial shows a complete run with the logs having the required information.
How do I debug why this happened?
What did you expect to happen:
The metrics collector to run successfully and end the experiment.
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
): v1.17/etc/os-release
):The text was updated successfully, but these errors were encountered: