Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve observability of universal operator #1340

Closed
Jeffwan opened this issue Aug 6, 2021 · 7 comments
Closed

Improve observability of universal operator #1340

Jeffwan opened this issue Aug 6, 2021 · 7 comments

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Aug 6, 2021

Part of #1318

  1. We are currently using default liveness/readiness from kubebuilder. It would be great to double check if that's enough for our case

  2. Job metrics are not unified. TensorFlow has best support and same prometheus metrics should be observed for rest of the frameworks.

/help

@google-oss-robot
Copy link

@Jeffwan:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Part of #1318

  1. We are currently using default liveness/readiness from kubebuilder. It would be great to double check if that's enough for our case

  2. Job metrics are not unified. TensorFlow has best support and same prometheus metrics should be observed for rest of the frameworks.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Jeffwan Jeffwan changed the title Improve observability for universal operator Improve observability of universal operator Aug 6, 2021
@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 6, 2021

/good-first-issue

@google-oss-robot
Copy link

@Jeffwan:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@deepak-muley
Copy link
Contributor

I am working on this issue and will have a PR early next week.

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 13, 2021

@deepak-muley Thanks! I assign to you

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 13, 2021

/priority p1

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Mar 2, 2022
@stale stale bot closed this as completed Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants