Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[grafana] Update (seems to) delete all data #4665

Closed
nrmitchi opened this issue Apr 3, 2018 · 17 comments
Closed

[grafana] Update (seems to) delete all data #4665

nrmitchi opened this issue Apr 3, 2018 · 17 comments
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@nrmitchi
Copy link
Contributor

nrmitchi commented Apr 3, 2018

Which chart:
stable/grafana

What happened:
helm upgrade -f chart-values/grafana.yaml grafana stable/grafana

The upgrade was simply bumping the version of the image to v5.0.4

All grafana data (data sources, dashboards, etc) are gone.

What you expected to happen:
It would update the chart and not delete all of the data.

Any default path that deletes existing data without warning is absolutely insane.

@nrmitchi nrmitchi changed the title [grafana] Update deletes all data [grafana] Update (seems to) delete all data Apr 3, 2018
@amnk
Copy link

amnk commented Apr 5, 2018

Grafana 5 has deprecated dashboards/json importer. Since [stable/grafana] chart has recently migrated to 5.0.4, we now don't have dashboards.

@alexmbird
Copy link

Just happened to me. Are you saying the default behaviour on helm upgrade ... is to delete all user-created dashboards?

@nrmitchi
Copy link
Contributor Author

@alexmbird It appears to be, however I don't believe that it is actually on purpose. IIRC, from the digging I did at the time (I was more focussed on recreating the dashboards that were deleted) the problem was that the helm chart generated a new PVC rather than reusing the existing one, which caused the old one to be recycled, and the default Reclaim behavior in many clusters is 'Delete'.

If that's not the case in your cluster, the volume may still exist, and you can likely recover it.

If not, and it's actually been deleted, I think you're out of lucky.

After this happened to me I did the following:

  • Manually modified the Reclaim Policy on the new grafana volume to 'Retain' (just incase)
  • Made sure that all grafana configuration was backed up outside of kubernetes

Hoping this works out for you....

@alexmbird
Copy link

alexmbird commented May 10, 2018

@nrmitchi thanks for replying so fast. It appears that's indeed what happened here. I look forward to a day rebuilding my dashboards :( At least the dataset is safe elsewhere.

Advice to anyone else before doing a helm upgrade ... grafana - back up your dashboards first.

@ptrus
Copy link

ptrus commented Jul 24, 2018

Hey @alexmbird @nrmitchi . I think that the problem is that grafana is using Deployment with PVC which means that volumes don't get reused when upgrading. There is an issue opened, mentioning this #1863 (If StatefulSet would be used instead of a Deployment then the volume would be re-used even if deleting & recreating grafana).

@stale
Copy link

stale bot commented Aug 23, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2018
@nrmitchi
Copy link
Contributor Author

I am personally against this issue closing, even though it is a bit stale. It's a pretty serious problem for anyone making use of this chart.

@stale stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2018
@deltasquare4
Copy link

So, I have kinda circumvented this issue by modifying the chart in my deployment, and that made this problem go away. The changes are not that hard to implement in this chart, but what stopped me is that they are not backward compatible. Here's what I did:

  • I was using grafana as part of prometheus-operator, which seeded the default dashboards through a template storing them in a ConfigMap. It depended on a "reloader" that watched that ConfigMap for changes and "reloader" (forcefully) re-seeded the dashboards, removing all the subsequent changes made from GUI and new dashboards. I removed the reloader sidecar and replaced it with built-in provisioning through overriding GF_PATHS_PROVISIONING, and restricting dashboard provisioning to a directory (instead of default General). This restricts the seeding to a single grafana "folder", which I don't need to override through GUI.
    apiVersion: 1
    providers:
    - name: 'kube-prometheus'
      orgId: 1
      folder: 'kube-prometheus-readonly'
      type: file
      disableDeletion: false
      updateIntervalSeconds: 3
      options:
        path: /tmp/grafana/dashboards
  • I also switched to using an external mysql database instead of default sqlite one. I think somehow this also played part in why my dashboards are not deleted anymore.
  • Switched to StatefulSet from Deployment as suggested above. This does not actually contribute to solving this problem, but I do think StatefulSet is a better structure to deploy grafana.

I'll be happy send a pull request. Suggestions as to what's the best way I can do that without breaking other people's existing deployments are welcome.

@stale
Copy link

stale bot commented Sep 23, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2018
@eskp
Copy link

eskp commented Sep 24, 2018

@deltasquare4 I'd be keen to see a pull request for StatefulSet and also an example of an external database configuration.

@stale stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 24, 2018
@deltasquare4
Copy link

I'll see what I can do about the PR. External database config relies on overriding config through environment variables (http://docs.grafana.org/installation/configuration/#using-environment-variables), which is also something I added in my StatefulSet implementation.

@deltasquare4
Copy link

Looks like datasource imports has already been available through #5369. Restricting automatically imported dashboards to a directory will only "refresh" the dashboards there upon chart upgrade. You have to override dashboardProviders from the chart to look something like this:

dashboardProviders:
  dashboardproviders.yaml:
    apiVersion: 1
    providers:
    - name: 'default'
      orgId: 1
      folder: 'default-dashboards'  # <- Provider will be restricted to this folder
      type: file
      disableDeletion: false
      options:
        path: /var/lib/grafana/dashboards/default

Combined with dashboards or dashboardsConfigMaps with matching name should automatically seed and refresh the dashboards.

Keep the dashboards you'd like to change from UI out of these "seeded" folders to keep them from being deleted. While I haven't verified this with default sqlite database, but this should work there as well. In case you'd like to use MySQL or PostgreSQL, override through custom environment variables through env or envFromSecret config.

Needless to say, please do back up your existing dashboards before upgrading. I use grafana's REST API to back up and restore the dashboards. Predictable and easier to script.

I don't think having grafana running in a StatefulSet adds any more value.

@nodox
Copy link

nodox commented Oct 9, 2018

I'm using Terraform to configure my dashboards and datasources and have found the same issue when using helm upgrade. What is the status of this issue? The only solution I see is for me to automate dashboard/datasource creation every time I do an upgrade.

@stale
Copy link

stale bot commented Nov 8, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2018
@stale
Copy link

stale bot commented Nov 22, 2018

This issue is being automatically closed due to inactivity.

@stale stale bot closed this as completed Nov 22, 2018
@bekriebel
Copy link

bekriebel commented Dec 31, 2018

This appears to still be an issue, at least it was for an upgrade I just performed. Information on how to avoid this should at least be documented.

Edit: Perhaps not on recent versions. I was upgrading for a fairly old version and lost my data. I did some testing of upgrading from previous versions of more recent chart versions and data was retained between upgrades. A note about upgrades from older versions would be nice, if there is a known break-point where this changed.

@alex88
Copy link

alex88 commented Feb 26, 2019

This actually seems to still be an issue, I've just tried to upgrade from chart grafana-1.16.0 (grafana v5.2.4) to grafana-2.2.0 (grafana v6.0) with a helm upgrade and luckily kubernetes stopped the upgrade and the new pod is still there with error persistentvolumeclaim "grafana" is being deleted, is there anything that needs to be done to not lose all the config on every upgrade?

Update: just to clarify, my values file has a persistence.existingClaim set to the name of an existing PVC, however for some reason it's trying to delete it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants