Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus http and kernel startup/shutdown metrics #1377

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MaicoTimmerman
Copy link

This MR is taking on #731.

The jupyter_server already ships with a PrometheusMetricsHandler, however, that handler inherits from JupyterHandler, which requires authentication. The enterprise gateway project doesn't integrate with that method of authentication, therefore I added separate handler to serve the metrics.

I've included 3 metrics in the initial implementation:

  • All HTTP response timings per handler, status_code and method
  • Kernel startup duration in seconds
  • Kernel shutdown duration in seconds

In terms of configuration, I've included the EG_METRICS_PREFIX environment variable for now, similar to how other configurations for the process proxies are set.

Sample responses from the /metrics endpoint
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.005",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.01",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.025",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.05",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.075",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.1",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.25",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.5",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="0.75",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="1.0",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="2.5",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="5.0",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="7.5",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="10.0",method="POST",status_code="200"} 0.0
enterprise_gateway_http_request_duration_seconds_bucket{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",le="+Inf",method="POST",status_code="200"} 1.0
enterprise_gateway_http_request_duration_seconds_count{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",method="POST",status_code="200"} 1.0
enterprise_gateway_http_request_duration_seconds_sum{handler="enterprise_gateway.services.kernels.handlers.KernelActionHandler",method="POST",status_code="200"} 19.83746862411499

Copy link

welcome bot commented Mar 27, 2024

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@MaicoTimmerman
Copy link
Author

Read the docs build failed due to timeouts:

    RuntimeError: Download error (28) Timeout was reached [https://conda.anaconda.org/free/noarch/repodata.json]
    Operation too slow. Less than 30 bytes/sec transferred the last 60 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant