-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.9.1] restarting process irrevocably BREAKS measurements with spaces #3319
Comments
This doesn't appear to be an issue with newly created measurements on 0.9.1:
@greglook I don't know if it will help with the grafana issue, but you can try selecting with a regex. For instance, does Agreed this sounds like a terrible regression. I'll see if I can repro with an 0.9.0 to 0.9.1 upgrade. |
Selecting via regex does not return any results, most of our graphs already use regex-based queries. Sounds like the path forward here is to move our old DB and recreate it so the measurements get recreated. |
@greglook that's very odd that there's no way to get the original data back out. I don't quite follow what you're saying, but creating a new database and new measurements should solve the immediate issue. As for recovering the old data, we'll need help from the core team. @otoolep or @dgnorton did any of your recent changes possibly contribute to this? |
It's not the end of the world if we lose the old data, but this is definitely not what I expected out of a minor version upgrade being touted for stability improvements. |
Further investigation. Spun up bare Ubuntu 14.04 box on DO. Installed 0.9.0.
From the CLI:
Upgraded to 0.9.1:
Now back to the CLI:
@greglook My apologies for this bad regression issue on upgrade. Clearly there's a gap in our automated testing and we'll take a look. @toddboom Let's make sure the soak/load tester uses some identifiers with spaces. @otoolep, @jwilder, @dgnorton seems like this regression is probably somewhere in one of your PRs, since @corylanou was on vacation for 0.9.1. Any ideas if @greglook can recover the data? |
For the moment I've just yanked the database data directory out from under InfluxDB and I can confirm we're seeing the newly-created series show up and query fine. If it turns out there's an easy way to recover the data in the next day or two, I'm all ears! If it takes longer than that it may not be worth it unless we can merge the old data into the new database, since the fresh data is more useful than older history with a gap up to the present. |
So it turns out that just restarting the service is enough to break all series with spaces in them. We lost our data again. Downgrading to 0.9.0 until this is sorted out... not cool. |
@greglook I just confirmed that restarting the process somehow corrupts the metastore indices for measurements with spaces. I've got core devs looking at it already, no estimate yet on when this will get fixed but it's a top priority as it involves data loss. |
I understand. Sorry for the agitated posts, but as you can imagine this is not a great experience for our core monitoring system to be built on. :( |
Any progress on this issue? A restart while running 0.9.0 causes the same behavior (using the matadata store from the 0.9.1 version). I cleaned out the whole data folder this time and hopefully won't see a recurrence while we're on the old version. |
@greglook I've certainly raised awareness on the core team. I suspect this fix will get cherry-picked back into the 0.9.2 RCs once it's been solved. I don't have any more update than that. |
@greglook A fix for this is in 0.9.2 branch and current master. |
Verified with @jwilder that the fix will restore prior functionality, so if you had an 0.9.0 datastore with measurements that had spaces, upgrading to 0.9.2 will make those measurements available for queries again. |
Since upgrading from 0.9.0 to 0.9.1 earlier, all measurements with spaces in their names have been broken, including all historical data. The package was upgraded by downloading the
.deb
and installing in-place over 0.9.0. Once done, the server booted up without any issues, but I immediately noticed that most (but not all) of the graphs in Grafana were broken.A bit more investigating revealed that all metrics with a single word name (e.g.
"cpu"
,"load"
, etc.) were working fine, but anything with spaces (e.g."disk /boot"
,"riemann streams rate"
) were not returning any results to queries. The queries hadn't changed, and look like this:When I logged in using the
influx
CLI, I confirmed that querying for data from these measurements did not return any data. However, after runningSHOW MEASUREMENTS
I noticed that every measurement with spaces show up twice - once as usual, and once with the spaces escaped:I note that
SHOW SERIES
does not return any measurements without escaped names, which is bizarre. As I said, querying for data from the original (unescaped) measurement returns no data; querying for data from the escaped version (by double-escaping) complains that there's no fields in that measurement:This seems to have changed something at the data storage level, because downgrading InfluxDB to 0.9.0 has not fixed the issue. So, the current state of things is that the majority of our data cannot be accessed right now, despite no errors of any kind from Riemann, InfluxDB, or Grafana. Needless to say, this is a HUGE REGRESSION and we're currently facing loss of all our old data to try to get this back into a working state.
I posted in IRC but there was little to no response.
The text was updated successfully, but these errors were encountered: