Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Error Messages - When Pod Creating #1474

Closed
aronchick opened this issue Jun 8, 2019 · 5 comments
Closed

Better Error Messages - When Pod Creating #1474

aronchick opened this issue Jun 8, 2019 · 5 comments
Assignees
Labels
area/troubleshoot lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@aronchick
Copy link

I ran several pipelines serially, and the problem was that the containers needed to bind a PVC for each. The PVCs were read only one - so subsequent containers sat around pending, waiting for the PVC to release. Would be great if there was a better error message here than.

@elikatsis
Copy link
Member

Hello David.

That's weird. First of all, PVCs can be one of the following (info by the official documentation):

  • ReadWriteOnce (the volume can be mounted as read-write by a single node)
  • ReadOnlyMany (the volume can be mounted read-only by many nodes)
  • ReadWriteMany (means the volume can be mounted as read-write by many nodes)

You probably mean that your PVCs where RWO, right?
If yes, then there shouldn't be any pods pending because of that. Instead, they should just be scheduled on the same node (adding load on that node).

On the other hand, if you mean that the access mode was ROM, that would allow pods to be scheduled to any node.

@aronchick
Copy link
Author

The PVC is RWO - and the second pod that tried to start (both were TFJobs) would fail and crash because the PVC was not mountable. Then it would continue to do so over many minutes, with a CrashLoopBackoff every time, until, eventually, the first job finished, released the PVC, and the second PVC (eventually) restarted and picked it up.

@jessiezcc
Copy link
Contributor

@IronPan, should this error handling happen at component level or pipeline level?

@stale
Copy link

stale bot commented Jun 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 25, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Jun 30, 2020

It's now possible checking pod yaml and events directly from KFP UI: #3304

@Bobgy Bobgy closed this as completed Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/troubleshoot lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
None yet
Development

No branches or pull requests

5 participants