You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Driver pod was removed in the Completed state.
In the dashboards the metric shows that they still use resources:
Memory Metrics (Driver) - JVM Executor Memory
Executors Metrics - Memory
In Kubernetes there is no Pod, so resources are not used.
Expected
The used resources after completion of the job are 0.
Versions
Operating system: Ubuntu 22.04.3 LTS
Juju CLI: 3.3.1
Juju agent: 3.3.1
Charm revision:
$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
spark aws-eu-west-1 demo-spark/localhost 3.3.1 unsupported 08:45:20Z
SAAS Status Store URL
cos-traefik active aws-eu-west-1 admin/cos.traefik
App Version Status Scale Charm Channel Rev Address Exposed Message
s3-integrator active 1 s3-integrator edge 14 10.152.183.19 no
spark-history-server-k8s waiting 1 spark-history-server-k8s 3.4/stable 15 10.152.183.249 no waiting for units to settle down
Unit Workload Agent Address Ports Message
s3-integrator/0* active idle 10.1.45.200
spark-history-server-k8s/0* blocked idle 10.1.45.255 Missing S3 relation
Integration provider Requirer Interface Type Message
cos-traefik:ingress spark-history-server-k8s:ingress ingress regular
s3-integrator:s3-credentials spark-history-server-k8s:s3-credentials s3 regular
s3-integrator:s3-integrator-peers s3-integrator:s3-integrator-peers s3-integrator-peers peer
microk8s:
juju status
Model Controller Cloud/Region Version SLA Timestamp
microk8s aws-eu-west-1 aws/eu-west-1 3.3.1 unsupported 09:18:19Z
SAAS Status Store URL
cos-alertmanager active aws-eu-west-1 admin/cos.alertmanager-karma-dashboard
cos-grafana active aws-eu-west-1 admin/cos.grafana-dashboards
cos-loki active aws-eu-west-1 admin/cos.loki-logging
cos-prometheus active aws-eu-west-1 admin/cos.prometheus-receive-remote-write
App Version Status Scale Charm Channel Rev Exposed Message
grafana-agent-cos active 1 grafana-agent latest/edge 28 no
microk8s 1.29.1 active 1 microk8s latest/edge 232 yes node is ready
Unit Workload Agent Machine Public address Ports Message
microk8s/0* active idle 0 3.252.197.189 16443/tcp node is ready
grafana-agent-cos/0* active idle 3.252.197.189
Machine State Address Inst id Base AZ Message
0 started 3.252.197.189 i-014cd20da6c22599a ubuntu@22.04 eu-west-1b running
COS:
$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
cos aws-eu-west-1 demo-spark/localhost 3.3.1 unsupported 09:43:46Z
App Version Status Scale Charm Channel Rev Address Exposed Message
alertmanager 0.25.0 active 1 alertmanager-k8s stable 96 10.152.183.187 no
catalogue active 1 catalogue-k8s stable 33 10.152.183.133 no
cos-configuration-k8s 3.5.0 active 1 cos-configuration-k8s stable 42 10.152.183.234 no
grafana 9.2.1 active 1 grafana-k8s stable 93 10.152.183.51 no
loki 2.7.4 active 1 loki-k8s stable 105 10.152.183.236 no
prometheus 2.47.2 active 1 prometheus-k8s stable 159 10.152.183.199 no
prometheus-pushgateway-k8s 1.6.2 active 1 prometheus-pushgateway-k8s edge 7 10.152.183.241 no
traefik 2.10.4 active 1 traefik-k8s stable 166 150.0.0.1 no
Unit Workload Agent Address Ports Message
alertmanager/0* active idle 10.1.45.248
catalogue/0* active idle 10.1.45.221
cos-configuration-k8s/0* active idle 10.1.45.218
grafana/0* active idle 10.1.45.204
loki/0* active idle 10.1.45.208
prometheus-pushgateway-k8s/0* active idle 10.1.45.195
prometheus/0* active idle 10.1.45.222
traefik/0* active idle 10.1.45.215
Offer Application Charm Rev Connected Endpoint Interface Role
alertmanager-karma-dashboard alertmanager alertmanager-k8s 96 0/0 karma-dashboard karma_dashboard provider
grafana-dashboards grafana grafana-k8s 93 1/1 grafana-dashboard grafana_dashboard requirer
loki-logging loki loki-k8s 105 1/1 logging loki_push_api provider
prometheus-receive-remote-write prometheus prometheus-k8s 159 1/1 receive-remote-write prometheus_remote_write provider
prometheus-scrape prometheus prometheus-k8s 159 0/0 metrics-endpoint prometheus_scrape requirer
traefik traefik traefik-k8s 166 1/1 ingress ingress provider
Integration provider Requirer Interface Type Message
alertmanager:alerting loki:alertmanager alertmanager_dispatch regular
alertmanager:alerting prometheus:alertmanager alertmanager_dispatch regular
alertmanager:grafana-dashboard grafana:grafana-dashboard grafana_dashboard regular
alertmanager:grafana-source grafana:grafana-source grafana_datasource regular
alertmanager:replicas alertmanager:replicas alertmanager_replica peer
alertmanager:self-metrics-endpoint prometheus:metrics-endpoint prometheus_scrape regular
catalogue:catalogue alertmanager:catalogue catalogue regular
catalogue:catalogue grafana:catalogue catalogue regular
catalogue:catalogue prometheus:catalogue catalogue regular
catalogue:replicas catalogue:replicas catalogue_replica peer
cos-configuration-k8s:grafana-dashboards grafana:grafana-dashboard grafana_dashboard regular
cos-configuration-k8s:replicas cos-configuration-k8s:replicas cos_configuration_replica peer
grafana:grafana grafana:grafana grafana_peers peer
grafana:metrics-endpoint prometheus:metrics-endpoint prometheus_scrape regular
grafana:replicas grafana:replicas grafana_replicas peer
loki:grafana-dashboard grafana:grafana-dashboard grafana_dashboard regular
loki:grafana-source grafana:grafana-source grafana_datasource regular
loki:metrics-endpoint prometheus:metrics-endpoint prometheus_scrape regular
loki:replicas loki:replicas loki_replica peer
prometheus-pushgateway-k8s:metrics-endpoint prometheus:metrics-endpoint prometheus_scrape regular
prometheus-pushgateway-k8s:pushgateway-peers prometheus-pushgateway-k8s:pushgateway-peers pushgateway_peers peer
prometheus:grafana-dashboard grafana:grafana-dashboard grafana_dashboard regular
prometheus:grafana-source grafana:grafana-source grafana_datasource regular
prometheus:prometheus-peers prometheus:prometheus-peers prometheus_peers peer
traefik:ingress alertmanager:ingress ingress regular
traefik:ingress catalogue:ingress ingress regular
traefik:ingress-per-unit loki:ingress ingress_per_unit regular
traefik:ingress-per-unit prometheus:ingress ingress_per_unit regular
traefik:metrics-endpoint prometheus:metrics-endpoint prometheus_scrape regular
traefik:peers traefik:peers traefik_peers peer
traefik:traefik-route grafana:ingress traefik_route regular
cos-configuration-k8s config:
juju config cos-configuration-k8s
application: cos-configuration-k8s
application-config:
juju-application-path:
default: /
description: the relative http path used to access an application
source: default
type: string
value: /
juju-external-hostname:
description: the external hostname of an exposed application
source: unset
type: string
kubernetes-ingress-allow-http:
default: false
description: whether to allow HTTP traffic to the ingress controller
source: default
type: bool
value: false
kubernetes-ingress-class:
default: nginx
description: the class of the ingress controller to be used by the ingress resource
source: default
type: string
value: nginx
kubernetes-ingress-ssl-passthrough:
default: false
description: whether to passthrough SSL traffic to the ingress controller
source: default
type: bool
value: false
kubernetes-ingress-ssl-redirect:
default: false
description: whether to redirect SSL traffic to the ingress controller
source: default
type: bool
value: false
kubernetes-service-annotations:
description: a space separated set of annotations to add to the service
source: unset
type: attrs
kubernetes-service-external-ips:
description: list of IP addresses for which nodes in the cluster will also accept
traffic
source: unset
type: string
kubernetes-service-externalname:
description: external reference that kubedns or equivalent will return as a CNAME
record
source: unset
type: string
kubernetes-service-loadbalancer-ip:
description: LoadBalancer will get created with the IP specified in this field
source: unset
type: string
kubernetes-service-loadbalancer-sourceranges:
description: traffic through the load-balancer will be restricted to the specified
client IPs
source: unset
type: string
kubernetes-service-target-port:
description: name or number of the port to access on the pods targeted by the
service
source: unset
type: string
kubernetes-service-type:
description: determines how the Service is exposed
source: unset
type: string
trust:
default: false
description: Does this application have access to trusted credentials
source: default
type: bool
value: false
charm: cos-configuration-k8s
settings:
git_branch:
default: master
description: The git branch to check out.
source: user
type: string
value: dashboard
git_depth:
default: 1
description: |
Cloning depth, to truncate commit history to the specified number of commits. Zero means no truncating.
source: default
type: int
value: 1
git_repo:
description: URL to repo to clone and sync against.
source: user
type: string
value: https://github.com/canonical/charmed-spark-rock
git_rev:
default: HEAD
description: The git revision (tag or hash) to check out
source: default
type: string
value: HEAD
git_ssh_key:
description: |
An optional SSH private key to use when cloning the repository.
source: unset
type: string
grafana_dashboards_path:
default: grafana_dashboards
description: Relative path in repo to grafana dashboards.
source: user
type: string
value: dashboards/prod/grafana/
loki_alert_rules_path:
default: loki_alert_rules
description: Relative path in repo to loki rules.
source: default
type: string
value: loki_alert_rules
prometheus_alert_rules_path:
default: prometheus_alert_rules
description: Relative path in repo to prometheus rules.
source: default
type: string
value: prometheus_alert_rules
The text was updated successfully, but these errors were encountered:
Yes I noticed this as well and we have been discussing this with Observability team in Vancouver. The issue here is that the metrics in prometheus-pushgateway are not removed at the end of the spark job, and as a consequence, prometheus keeps on scraping them.
We are discussing what is the best way to get them removed from pushgateway, whether to have the spark job process to do that at the end (but we would risk to remove them before prometheus would scrape them) or have a process to do so.
I'll keep this thread posted as soon as we come up with a decision for the way forward.
Reproduce
Actual
The Driver pod was removed in the Completed state.
In the dashboards the metric shows that they still use resources:
In Kubernetes there is no Pod, so resources are not used.
Expected
The used resources after completion of the job are 0.
Versions
Operating system: Ubuntu 22.04.3 LTS
Juju CLI: 3.3.1
Juju agent: 3.3.1
Charm revision:
microk8s:
COS:
cos-configuration-k8s config:
The text was updated successfully, but these errors were encountered: