-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert failure in block.DecrRef #1350
Comments
When running flock for several days with
|
@darkn3rd do you know how long flock was running for? |
@jarifibrahim I'm unsure about the last run, I think for a few hours. Recently I tried to run with the existing data directories, but upgraded server, and I ran into the problem. It took 20 minutes to reproduce. Logs
|
The crash originates from cache eviction handler. Lines 187 to 192 in 158d927
Maybe the iterator is closing the same block twice somehow. Once in the Lines 43 to 46 in 158d927
and the second time in the Lines 111 to 113 in 158d927
I don't know how that could happen but that's the only way |
Looking at the code, I think there are at least two issues: Issue #1: It seems to me that ristretto doesn't currently guarantee that the block has been added to the cache if and only if More specifically, in the case of updates, the item is first updated in the cache, then it is put on the From there, Lines 588 to 591 in 158d927
The scenario where it can happen is a race between two goroutines loading the same block that is not yet in the cache:
Issue #2: It seems like the current code assumes that the evict handler is always called when an item is removed from the cache. But that does not seem to be true when items are updated. The evict handler will not be called on the removed version. As a consequence, blocks will leak when two goroutines race, like above, to load an item that is not in the cache. Luckily, the only effect of this is more memory garbage. |
I wondered if that was plausible or not, and it is:
Lines 176 to 179 in e013bfd
Line 140 in e013bfd
Lines 168 to 175 in e013bfd
At the default table (64 MB) and block size (4 kB), there are 16384 blocks per table. At the default level multiplier, you expect a typical compaction to delete 11 tables (10 at the bottom layer, one at the top layer), so each compaction generates 180224 cache deletion requests in a relatively tight loop (but only relatively, there are a bunch of syscalls in The setBuf channel in ristretto has 32768 entries, so it sounds highly likely to me a real-life application could end up filling the set buffer, triggering the issue. |
This is fixed in ristretto via dgraph-io/ristretto#167 and ristretto was updated in badger via #1391 . Thank you for the help @damz ! |
What version of Go are you using (
go version
)?go 1.13
What version of Badger are you using?
6eaa500
Steps to Reproduce the issue
What did you do?
Run queries and mutation on dgraph.
What did you expect to see?
I expected no crash.
What did you see instead?
The text was updated successfully, but these errors were encountered: