Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grafana dashboards lost after a K8S upgrade #207

Closed
tkalanick opened this issue Nov 30, 2018 · 7 comments
Closed

grafana dashboards lost after a K8S upgrade #207

tkalanick opened this issue Nov 30, 2018 · 7 comments
Labels
type/question Further information is requested

Comments

@tkalanick
Copy link

tkalanick commented Nov 30, 2018

my dashboard is lost after an upgraded of GKE to 1.11.3-gke.18. i am still able to issue DB queries and the database data is still there.

i don't see errors on both prometheus and grafana containers. there was a rebuild of the services apparently, but where is the dashboard?

can you keep the grafana dashboard template json file in the git repo so that i can re-import when this happens again?

t=2018-11-30T23:42:10+0000 lvl=info msg="Request Completed" logger=context userId=1 orgId=1 uname=admin method=GET path=/api/dashboards/db/tidb-cluster-overview status=404 remote_addr=127.0.0.1 time_ms=1 size=33 referer="http://localhost:3000/dashboard/db/tidb-cluster-overview?refresh=1m&orgId=1"

grafana container log is attached.
grafana.log

@gregwebs
Copy link
Contributor

gregwebs commented Dec 1, 2018

So the pod got re-scheduled during the upgrade? I think the grafana settings are not being stored to persistent disk. The grafana setup is a bit of a hack. I think that addressing it as per #133 would prevent this issue from happening.

The dashboard is setup by this job: https://github.com/pingcap/tidb-operator/blob/master/charts/tidb-cluster/templates/monitor-job.yaml
Perhaps re-running it will do the trick?

@tkalanick
Copy link
Author

do i re-run this from helm? can you give me the command? thanks.

@tennix
Copy link
Member

tennix commented Dec 1, 2018

@tkalanick The monitor data by default uses emptyDir so pod rebuild will lose the monitor data including dashboard template. But it can be persisted in a PV by configuring this three variables

persistent: false
storageClassName: local-storage
storage: 10Gi

If you lost the dashboard template, you can just delete the monitor-configurator job and then helm upgrade <release-name> charts/tidb-cluster.

We have plans to use Grafana 5.x static provisioning so no configurator is needed. and the dashboard json files are stored as configmap.

@tkalanick
Copy link
Author

thanks. i was able to do helm upgrade and i can now see the dashboard.

however merely changing persistent to true on line 159 didn't work for me. the pod became unschedulable after i ran helm upgrade. do i need to prepare a persistent volume?

@tennix
Copy link
Member

tennix commented Dec 1, 2018

The tidb-cluster chart only creates a PVC for monitor if you enable monitor data persistence. So you should have a dynamic provision volume storage class.

@tennix
Copy link
Member

tennix commented Dec 4, 2018

@tkalanick Hi does the persistent volume for monitor data work for you?

@tennix tennix added the type/question Further information is requested label Jan 24, 2019
@tennix
Copy link
Member

tennix commented May 5, 2019

This should already be fixed with static provisioning.

@tennix tennix closed this as completed May 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants