UI could help in troubleshooting any pods stuck in ContainerCreating state #1711
Labels
area/frontend
area/pipelines
area/troubleshoot
help wanted
The community is welcome to contribute.
kind/feature
priority/p1
status/triaged
Whether the issue has been explicitly triaged
What happened:
If a pipeline is run where the pod for one of the steps is stuck in
ContainerCreating
or any other non-running state, the Pipeline UI is capable of showing what state the pod is in, but not why or what the user running the experiment should do to resolve things.Additionally, an error message is shown about not being able to view the logs, which happens since there are no logs to display for a pod that has not yet run - but the phrasing is somewhat confusing.
What did you expect to happen:
It would be really helpful to ML engineers running an experiment for the UI to attempt to diagnose the problem, or at least to display the events from the Kubernetes pod - which in the case of the screenshot above, would show that it was due to the pod attempting to mount a Secret that does not actually exist.
As-is, the user running the experiment has to inspect the state of the Kubernetes cluster to troubleshoot the problem, and we have found that often the engineer running the experiment does not have the experience or background to be able to do so effectively.
Anything else you would like to add:
For what it is worth, this is less of a bug than feedback and a feature request.
The text was updated successfully, but these errors were encountered: