-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when online write #5456
Comments
When i shutdown and restart alpha with v1.2.4, the problem is still there. But with v1.2.3, there is no problem. |
I've seen similar failures in badger on PR dgraph-io/badger#1308 . This is an issue in badger. |
we are on heavy load.the server crash every 10 minutes . how we can handle it? |
my dgraph version is 20.03.2 |
@h4midr Did you try downgrading your version? Dgraph v20.03.1 should work fine. |
@h4midr Have you tried using Dgraph v20.03.1 and are still seeing this issue? We have reached out to you over email as well. |
i'll do it but we are not in situation that have options for test.as soon as exams end i'm going to simulate the process to see how it happens again. |
it sounds like when the mutations request increase, it happens.the duration between two failures depends on request rates. |
This is a bug in badger and not in dgraph. It can be reproduced by the following steps
|
so finally. do version 1.2.3 has this problem too? |
@h4midr No, this affects only version 1.2.4 and v20.03.2 . |
Fixes dgraph-io/dgraph#5456 . This PR fixes the crash that could occur when a block was read from the cache. There was a logical race condition. The following sequence of events could occur which would cause the crash. 1. An iterator makes `t.Block(idx)` call 2. The `t.Block` function finds the block in the cache. The newly found block has `ref=1` which means it was held only by the cache. 3. The `t.Block` function is holding the block and at the same time the block gets evicted from the cache. The existing ref of the block was `1` so the cache eviction would decrement the ref and make it `0`. When the ref becomes `0`, the block is added to the `sync.Pool` and is ready to be reused. 4. While the block got evicted from the cache, the iterator received the block and it incremented the ref from `0` to `1` and starts using this. Since the block was inserted into the syncPool in the 3rd event, it could be modified by anyone while the iterator is using it.
This is not yet fixed in Dgraph. I'll close this issue once badger is updated in Dgraph. |
This has been fixed by updating badger in #5404 |
Hey @JimWen and @h4midr, we've released a new patch release candidate with the fix for this issue https://github.com/dgraph-io/dgraph/releases v20.03.3-rc1 and v1.2.5-rc1 . I would really appreciate it if you guys can help us test it. We believe it is fixed but it would be very helpful if someone running dgraph can also confirm it. |
thank you very much. |
i test with v20.03.3, online write has worked fine until now for 2 days. @jarifibrahim |
Fixes dgraph-io/dgraph#5456 . This PR fixes the crash that could occur when a block was read from the cache. There was a logical race condition. The following sequence of events could occur which would cause the crash. 1. An iterator makes `t.Block(idx)` call 2. The `t.Block` function finds the block in the cache. The newly found block has `ref=1` which means it was held only by the cache. 3. The `t.Block` function is holding the block and at the same time the block gets evicted from the cache. The existing ref of the block was `1` so the cache eviction would decrement the ref and make it `0`. When the ref becomes `0`, the block is added to the `sync.Pool` and is ready to be reused. 4. While the block got evicted from the cache, the iterator received the block and it incremented the ref from `0` to `1` and starts using this. Since the block was inserted into the syncPool in the 3rd event, it could be modified by anyone while the iterator is using it.
Fixes dgraph-io/dgraph#5456 . This PR fixes the crash that could occur when a block was read from the cache. There was a logical race condition. The following sequence of events could occur which would cause the crash. 1. An iterator makes `t.Block(idx)` call 2. The `t.Block` function finds the block in the cache. The newly found block has `ref=1` which means it was held only by the cache. 3. The `t.Block` function is holding the block and at the same time the block gets evicted from the cache. The existing ref of the block was `1` so the cache eviction would decrement the ref and make it `0`. When the ref becomes `0`, the block is added to the `sync.Pool` and is ready to be reused. 4. While the block got evicted from the cache, the iterator received the block and it incremented the ref from `0` to `1` and starts using this. Since the block was inserted into the syncPool in the 3rd event, it could be modified by anyone while the iterator is using it.
What version of Dgraph are you using?
Dgraph version : v1.2.4
Dgraph SHA-256 : 38ca7ecc19103d21bdd1eddf0d2695e872883b37d4c270916606b0b51f1d0a1b
Commit SHA-1 : b51171c
Commit timestamp : 2020-05-15 15:42:10 -0700
Branch : HEAD
Go version : go1.13.6
Have you tried reproducing the issue with the latest release?
v1.2.4
What is the hardware spec (RAM, OS)?
128G mem & 1.8T SSD
Linux version 3.10.0-1062.9.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019
Steps to reproduce the issue (command/config used to run Dgraph).
Just online write with java api, its wierd.
Expected behaviour and actual result.
Alpha crah and cluster is blocked. The log is as followings
`
panic: runtime error: slice bounds out of range [:135467521] with capacity 7968
goroutine 10106572 [running]:
github.com/dgraph-io/badger/v2/table.(*blockIterator).setIdx(0xc001b41120, 0x0)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/iterator.go:89 +0x52c
github.com/dgraph-io/badger/v2/table.(*blockIterator).seekToFirst(...)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/iterator.go:147
github.com/dgraph-io/badger/v2/table.(*Iterator).next(0xc001b41110)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/iterator.go:314 +0x12b
github.com/dgraph-io/badger/v2/table.(*Iterator).next(0xc001b41110)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/iterator.go:323 +0x1b3
github.com/dgraph-io/badger/v2/table.(*Iterator).Next(0xc001b41110)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/iterator.go:379 +0x47
github.com/dgraph-io/badger/v2/table.(*ConcatIterator).Next(0xc682d840a0)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/iterator.go:497 +0x33
github.com/dgraph-io/badger/v2/table.(*node).next(0xc112aec210)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/merge_iterator.go:82 +0x3d
github.com/dgraph-io/badger/v2/table.(*MergeIterator).Next(0xc112aec210)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/merge_iterator.go:158 +0x38
github.com/dgraph-io/badger/v2/table.(*node).next(0xc112aec300)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/merge_iterator.go:80 +0x72
github.com/dgraph-io/badger/v2/table.(*MergeIterator).Next(0xc112aec2c0)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/table/merge_iterator.go:158 +0x38
github.com/dgraph-io/badger/v2.(*Iterator).parseItem(0xc22baf0000, 0x1)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/iterator.go:621 +0x17f
github.com/dgraph-io/badger/v2.(*Iterator).Next(0xc22baf0000)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/iterator.go:566 +0x10b
github.com/dgraph-io/badger/v2.(*Stream).produceKVs.func1(0xc83b156a40, 0x17, 0x20, 0xc83b156a60, 0x17, 0x20, 0x0, 0x0, 0x0)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/stream.go:176 +0xa48
github.com/dgraph-io/badger/v2.(*Stream).produceKVs(0xc5d2ba5ea0, 0x1a971a0, 0xc000038118, 0x0, 0x0)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/stream.go:234 +0x2c6
github.com/dgraph-io/badger/v2.(*Stream).Orchestrate.func1(0xc5d4a7daa0, 0xc5d2ba5ea0, 0x1a971a0, 0xc000038118, 0xc5d4a53ec0)
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/stream.go:337 +0x7f
created by github.com/dgraph-io/badger/v2.(*Stream).Orchestrate
/root/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200515210839-ef28ef36b592/stream.go:334 +0x175
`
The text was updated successfully, but these errors were encountered: