-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check error when opening shard DBs. #3521
Conversation
shard.Open() | ||
err = shard.Open() | ||
if err != nil { | ||
s.Logger.Printf("failed to open shard %d: %s", shardID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a real error here, I think we should actually return the error, not log it and continue. if you can't load a shard something bad happened, and likely is going to have something else bad happen if we let it continue. @pauldix thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went for the same error condition as https://github.com/DanielMorsing/influxdb/blob/sharderrcheck/tsdb/store.go#L233
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not clear which is the correct way to go either. It does feel like if a general shard.Open()
fails something bad happened, and I would like to have more context on if we really should continue or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check here https://github.com/DanielMorsing/influxdb/blob/sharderrcheck/tsdb/store.go#L233, is making sure the file in the directory is actually named how shards we created are named. It's to prevent trying to open something like 1.bak
which is not a file we would create.
L242 should return the error because we should only ever be attempting to open valid shards at this point. A failure would be problem.
This should catch the case where someone messed up the permisssions for a database that they moved from one machine to another.
c2d3195
to
65ea56a
Compare
I have a suspicion that #3603 is a result of this error not being checked. |
+1, I will disable the failing test and open a bug. |
check error when opening shard DBs.
Doing the initial work on #3516, I suspect that it's an issue where
we're throwing away an opening error and then accessing the database
to cause the panic. Since we threw away this error before, let's log
it so that we can see what's happening.
It doesn't fix the error proper, since the shard will still not be
accessible, but it will give us a better idea what's going on.