-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RocksDB: segfault in org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers #3734
Comments
Hi @dlg99, thanks for raising this issue. From the information you provided, the segfault issue seems caused by
RocksDB PR 10739 looks like not fix the issue we encountered. That issue was introduced since RocksDB 7.7 facebook/rocksdb#10739 (comment) We need to get the core dump file to investigate the root cause of the RocksDB segfault issue. I suggest reverting the PR #3653 on branch-4.14 and branch-4.15. For the master branch, we keep the PR and try to upgrade the RocksDB version to 7.8+ to see if the segfault issue is resolved. @merlimat @eolivelli Do you have any ideas? |
I tested the deletion performance, and the The number of deletion keys is 1/2 of the Total keys.
This is the test code. You can put the code in @Test
public void deleteBatchLedgersTest() throws Exception {
File tmpDir = File.createTempFile("bkTest", ".dir");
tmpDir.delete();
tmpDir.mkdir();
tmpDir.deleteOnExit();
EntryLocationIndex idx = new EntryLocationIndex(serverConfiguration, KeyValueStorageRocksDB.factory,
tmpDir.getAbsolutePath(), NullStatsLogger.INSTANCE);
int numLedgers = 10000;
int numEntriesPerLedger = 10000;
int location = 0;
KeyValueStorage.Batch batch = idx.newBatch();
for (int entryId = 0; entryId < numEntriesPerLedger; ++entryId) {
for (int ledgerId = 0; ledgerId < numLedgers; ++ledgerId) {
idx.addLocation(batch, ledgerId, entryId, location);
location++;
}
}
batch.flush();
batch.close();
for (int ledgerId = 0; ledgerId < numLedgers; ++ledgerId) {
if (ledgerId % 2 == 0) {
idx.delete(ledgerId);
}
}
idx.removeOffsetFromDeletedLedgers();
idx.close();
} |
I have test upgraded the RocksDB version from 6.10.2 to 7.9.2, and then rollback to 6.10.2.
2023-01-29T17:09:27,794+0800 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
java.io.IOException: Error open RocksDB database
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:200) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:63) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.<init>(LedgerMetadataIndex.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:170) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:150) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:819) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:120) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.server.Main.doMain(Main.java:226) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
at org.apache.bookkeeper.server.Main.main(Main.java:208) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
Caused by: org.rocksdb.RocksDBException: unknown checksum type 4 in data/bookkeeper/ledgers/current/ledgers/000025.sst offset 1078 size 33
at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
at org.rocksdb.RocksDB.open(RocksDB.java:239) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:197) ~[org.apache.bookkeeper-bookkeeper-server-4.14.6.jar:4.14.6]
... 13 more The root cause of this exception is that RocksDB 7.9.2 uses For the BookKeeper master branch, we have upgraded the RocksDB to For the RocksDB < 6.27, we can push a fix to ensure RocksDB 7.9.2 does not use the latest checksum type In a word, I'd send an email to discuss the RocksDB upgradation . |
@hangc0276 is it possible to set the old checksum format and make it configurable in order to allow rollback easily ? |
@eolivelli Yes, I will make the checksum type configurable. |
@hangc0276 Thank you for looking at this problem!
This means that time to confirm the fix goes into the remote future, Pulsar 2.10/2.11 use bk 4.15 IIRC. I think we still should try to upgrade RocksDB. I'd be ok with upgraded db backported to 4.14/4.15 if we can guarantee safe downgrade. Currently we've downgraded BK on prod so this problem is no longer happening, unfortunately it means I don't have any logs/dumps and it really happened only one time. I've spent some time experimenting with code/injecting errors. With this: diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java
index 3f6d1ae55b..03acfecc87 100644
--- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java
+++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java
@@ -26,6 +26,8 @@ import java.io.IOException;
import java.util.Map.Entry;
import java.util.Set;
import java.util.concurrent.TimeUnit;
+
+import lombok.SneakyThrows;
import org.apache.bookkeeper.bookie.Bookie;
import org.apache.bookkeeper.bookie.EntryLocation;
import org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorage.Batch;
@@ -189,6 +191,7 @@ public class EntryLocationIndex implements Closeable {
deletedLedgers.add(ledgerId);
}
+ @SneakyThrows
public void removeOffsetFromDeletedLedgers() throws IOException {
LongPairWrapper firstKeyWrapper = LongPairWrapper.get(-1, -1);
LongPairWrapper lastKeyWrapper = LongPairWrapper.get(-1, -1);
@@ -202,6 +205,7 @@ public class EntryLocationIndex implements Closeable {
log.info("Deleting indexes for ledgers: {}", ledgersToDelete);
long startTime = System.nanoTime();
+ locationsDb.close();
try (Batch batch = locationsDb.newBatch()) {
for (long ledgerId : ledgersToDelete) {
if (log.isDebugEnabled()) {
@@ -213,7 +217,6 @@ public class EntryLocationIndex implements Closeable {
batch.deleteRange(firstKeyWrapper.array, lastKeyWrapper.array);
}
-
batch.flush();
for (long ledgerId : ledgersToDelete) {
deletedLedgers.remove(ledgerId); I got rocksdb segfault
with this dump This does not look exactly as original case and more similar to #3043 but the question is i it possible some other rocksdb calls should not run concurrently like index update on deleted range? |
I agree with upgrading the RocksDB version. But I'm not sure if it will be OK to backport to the patch release version branch-4.14 / branch-4.15. |
In your test code, you closed the |
yeah, db close is not the best way to introduce an error. |
@merlimat @eolivelli @dlg99 any throughs about this suggestion? #3734 (comment) |
If there are no objections, I will revert the PR #3653 on branch-4.14 and branch-4.15, and trigger a new release for 4.14.7 |
### Motivations This PR is to resolve the issue #3734 in branch-4.14 by following this suggestion. #3734 (comment) ### Modifications 1. Revert #3653 2. Bring #3646 to branch-4.14 to make delete entries batch size configurable.
which version of Pulsar would incorporate 4.14.7 release? |
@muni-chada It will be introduced in Pulsar 2.8, 2.9 and 2.10 |
### Motivation Related to #3734 ### Modification Upgrade RocksDB version to 7.9.2
### Motivation Fix #3734 (comment) We have two rocksDB tables, one for the ledger index, and another for the entry log location. - ledger index RocksDB table: Use the default table option, and the checksum is `kCRC32c` - entry log location RocksDb table: Use configured table option, and the checksum is `kxxHash` When we upgrade the RocksDB version from 6.10.2 to 7.9.2, the new RocksDB version's default table checksum has changed from `kCRC32c` to `kXXH3`, and `kXXH3` only supported since RocksDB 6.27. The RocksDB version rollback to 6.10.2 will be failed due to RocksDB 6.10.2 doesn't support the `kXXH3` checksum type. ### Modifications In this PR, I make the RocksDB checksum type configurable. But there is one change that will change the ledger index RocksDB table's checksum type from the default `kCRC32c` to `kxxHash`. I have tested the compatibility of the two checksum types in and between multiple RocksDB versions, it works fine. After setting the two RocksDB table's checksum type to `kxxHash`, the RocksDB's version upgraded from 6.10.2 to 7.9.2, and rolling back to 6.10.2 works fine. ### More to discuss When writing the unit test to read the table checksum type from RocksDB configuration files, it failed. I found the related issue on RocksDB: facebook/rocksdb#5297 The related PR: facebook/rocksdb#10826 It means we still can't load RocksDB table options from configuration files. Maybe I missed some parts about reading RocksDB table options from the configuration file. If this issue exists, we do **NOT** recommend users configure RocksDB configurations through configuration files. @merlimat @eolivelli @dlg99 Please help take a look, thanks.
…e#3768) ### Motivations This PR is to resolve the issue apache#3734 in branch-4.14 by following this suggestion. apache#3734 (comment) ### Modifications 1. Revert apache#3653 2. Bring apache#3646 to branch-4.14 to make delete entries batch size configurable. (cherry picked from commit e56d6d6)
…e#3768) ### Motivations This PR is to resolve the issue apache#3734 in branch-4.14 by following this suggestion. apache#3734 (comment) ### Modifications 1. Revert apache#3653 2. Bring apache#3646 to branch-4.14 to make delete entries batch size configurable. (cherry picked from commit e56d6d6)
…ocksDB dependency (#20072) ### Motivation BookKeeper has upgraded the RocksDB dependency to 7.9.2, related discussion: https://lists.apache.org/thread/8j90y4vrvgz1nvt5pb0xdjjy3o8z57z7 apache/bookkeeper#3734 However, Pulsar also has the RocksDB dependency and it will override the BookKeeper's RocksDB dependency version. It will lead to the release package still using the old RocksDB version (6.29.4.1) ### Modifications Upgrade the Pulsar's RocksDB dependency to 7.9.2 to keep sync with the BookKeeper's RocksDB dependency.
Descriptions of the changes in this PR: ### Motivation Upgrade the RocksDB version to 6.29.4.1 to make sure BookKeeper 4.16.0 can roll back 4.14.x Refer to: #3734 (comment) ### Changes (Describe: what changes you have made) Master Issue: #<master-issue-number> > --- > In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit > checks for pull requests. A pull request can only be merged when it passes precommit checks. > > --- > Be sure to do all of the following to help us incorporate your contribution > quickly and easily: > > If this PR is a BookKeeper Proposal (BP): > > - [ ] Make sure the PR title is formatted like: > `<BP-#>: Description of bookkeeper proposal` > `e.g. BP-1: 64 bits ledger is support` > - [ ] Attach the master issue link in the description of this PR. > - [ ] Attach the google doc link if the BP is written in Google Doc. > > Otherwise: > > - [ ] Make sure the PR title is formatted like: > `<Issue #>: Description of pull request` > `e.g. Issue 123: Description ...` > - [ ] Make sure tests pass via `mvn clean apache-rat:check install spotbugs:check`. > - [ ] Replace `<Issue #>` in the title with the actual Issue number. > > ---
### Motivation Fix apache#3734 (comment) We have two rocksDB tables, one for the ledger index, and another for the entry log location. - ledger index RocksDB table: Use the default table option, and the checksum is `kCRC32c` - entry log location RocksDb table: Use configured table option, and the checksum is `kxxHash` When we upgrade the RocksDB version from 6.10.2 to 7.9.2, the new RocksDB version's default table checksum has changed from `kCRC32c` to `kXXH3`, and `kXXH3` only supported since RocksDB 6.27. The RocksDB version rollback to 6.10.2 will be failed due to RocksDB 6.10.2 doesn't support the `kXXH3` checksum type. ### Modifications In this PR, I make the RocksDB checksum type configurable. But there is one change that will change the ledger index RocksDB table's checksum type from the default `kCRC32c` to `kxxHash`. I have tested the compatibility of the two checksum types in and between multiple RocksDB versions, it works fine. After setting the two RocksDB table's checksum type to `kxxHash`, the RocksDB's version upgraded from 6.10.2 to 7.9.2, and rolling back to 6.10.2 works fine. ### More to discuss When writing the unit test to read the table checksum type from RocksDB configuration files, it failed. I found the related issue on RocksDB: facebook/rocksdb#5297 The related PR: facebook/rocksdb#10826 It means we still can't load RocksDB table options from configuration files. Maybe I missed some parts about reading RocksDB table options from the configuration file. If this issue exists, we do **NOT** recommend users configure RocksDB configurations through configuration files. @merlimat @eolivelli @dlg99 Please help take a look, thanks. (cherry picked from commit 3844bf1)
Descriptions of the changes in this PR: ### Motivation Upgrade the RocksDB version to 6.29.4.1 to make sure BookKeeper 4.16.0 can roll back 4.14.x Refer to: apache#3734 (comment) ### Changes (Describe: what changes you have made) Master Issue: #<master-issue-number> > --- > In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit > checks for pull requests. A pull request can only be merged when it passes precommit checks. > > --- > Be sure to do all of the following to help us incorporate your contribution > quickly and easily: > > If this PR is a BookKeeper Proposal (BP): > > - [ ] Make sure the PR title is formatted like: > `<BP-#>: Description of bookkeeper proposal` > `e.g. BP-1: 64 bits ledger is support` > - [ ] Attach the master issue link in the description of this PR. > - [ ] Attach the google doc link if the BP is written in Google Doc. > > Otherwise: > > - [ ] Make sure the PR title is formatted like: > `<Issue #>: Description of pull request` > `e.g. Issue 123: Description ...` > - [ ] Make sure tests pass via `mvn clean apache-rat:check install spotbugs:check`. > - [ ] Replace `<Issue #>` in the title with the actual Issue number. > > --- (cherry picked from commit c3a60bb)
### Motivation Fix apache#3734 (comment) We have two rocksDB tables, one for the ledger index, and another for the entry log location. - ledger index RocksDB table: Use the default table option, and the checksum is `kCRC32c` - entry log location RocksDb table: Use configured table option, and the checksum is `kxxHash` When we upgrade the RocksDB version from 6.10.2 to 7.9.2, the new RocksDB version's default table checksum has changed from `kCRC32c` to `kXXH3`, and `kXXH3` only supported since RocksDB 6.27. The RocksDB version rollback to 6.10.2 will be failed due to RocksDB 6.10.2 doesn't support the `kXXH3` checksum type. ### Modifications In this PR, I make the RocksDB checksum type configurable. But there is one change that will change the ledger index RocksDB table's checksum type from the default `kCRC32c` to `kxxHash`. I have tested the compatibility of the two checksum types in and between multiple RocksDB versions, it works fine. After setting the two RocksDB table's checksum type to `kxxHash`, the RocksDB's version upgraded from 6.10.2 to 7.9.2, and rolling back to 6.10.2 works fine. ### More to discuss When writing the unit test to read the table checksum type from RocksDB configuration files, it failed. I found the related issue on RocksDB: facebook/rocksdb#5297 The related PR: facebook/rocksdb#10826 It means we still can't load RocksDB table options from configuration files. Maybe I missed some parts about reading RocksDB table options from the configuration file. If this issue exists, we do **NOT** recommend users configure RocksDB configurations through configuration files. @merlimat @eolivelli @dlg99 Please help take a look, thanks. (cherry picked from commit 3844bf1)
Descriptions of the changes in this PR: ### Motivation Upgrade the RocksDB version to 6.29.4.1 to make sure BookKeeper 4.16.0 can roll back 4.14.x Refer to: apache#3734 (comment) ### Changes (Describe: what changes you have made) Master Issue: #<master-issue-number> > --- > In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit > checks for pull requests. A pull request can only be merged when it passes precommit checks. > > --- > Be sure to do all of the following to help us incorporate your contribution > quickly and easily: > > If this PR is a BookKeeper Proposal (BP): > > - [ ] Make sure the PR title is formatted like: > `<BP-#>: Description of bookkeeper proposal` > `e.g. BP-1: 64 bits ledger is support` > - [ ] Attach the master issue link in the description of this PR. > - [ ] Attach the google doc link if the BP is written in Google Doc. > > Otherwise: > > - [ ] Make sure the PR title is formatted like: > `<Issue #>: Description of pull request` > `e.g. Issue 123: Description ...` > - [ ] Make sure tests pass via `mvn clean apache-rat:check install spotbugs:check`. > - [ ] Replace `<Issue #>` in the title with the actual Issue number. > > --- (cherry picked from commit c3a60bb)
### Motivation Related to apache#3734 ### Modification Upgrade RocksDB version to 7.9.2
### Motivation Fix apache#3734 (comment) We have two rocksDB tables, one for the ledger index, and another for the entry log location. - ledger index RocksDB table: Use the default table option, and the checksum is `kCRC32c` - entry log location RocksDb table: Use configured table option, and the checksum is `kxxHash` When we upgrade the RocksDB version from 6.10.2 to 7.9.2, the new RocksDB version's default table checksum has changed from `kCRC32c` to `kXXH3`, and `kXXH3` only supported since RocksDB 6.27. The RocksDB version rollback to 6.10.2 will be failed due to RocksDB 6.10.2 doesn't support the `kXXH3` checksum type. ### Modifications In this PR, I make the RocksDB checksum type configurable. But there is one change that will change the ledger index RocksDB table's checksum type from the default `kCRC32c` to `kxxHash`. I have tested the compatibility of the two checksum types in and between multiple RocksDB versions, it works fine. After setting the two RocksDB table's checksum type to `kxxHash`, the RocksDB's version upgraded from 6.10.2 to 7.9.2, and rolling back to 6.10.2 works fine. ### More to discuss When writing the unit test to read the table checksum type from RocksDB configuration files, it failed. I found the related issue on RocksDB: facebook/rocksdb#5297 The related PR: facebook/rocksdb#10826 It means we still can't load RocksDB table options from configuration files. Maybe I missed some parts about reading RocksDB table options from the configuration file. If this issue exists, we do **NOT** recommend users configure RocksDB configurations through configuration files. @merlimat @eolivelli @dlg99 Please help take a look, thanks.
…ocksDB dependency (#20072) ### Motivation BookKeeper has upgraded the RocksDB dependency to 7.9.2, related discussion: https://lists.apache.org/thread/8j90y4vrvgz1nvt5pb0xdjjy3o8z57z7 apache/bookkeeper#3734 However, Pulsar also has the RocksDB dependency and it will override the BookKeeper's RocksDB dependency version. It will lead to the release package still using the old RocksDB version (6.29.4.1) ### Modifications Upgrade the Pulsar's RocksDB dependency to 7.9.2 to keep sync with the BookKeeper's RocksDB dependency.
BUG REPORT
Describe the bug
A prod server crashed because of the segfault in the RocksDB.
Unfortunately, the crash dump is lost. Logs point to org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers
It is hard to pinpoint the issue / match it to a specific rocksDB bug without the crash dump. I cannot repro the problem in unit test and even if I repro it I won't know if that's the exact problem.
So far the crash happened only one time, roughly the timing and code correlate with upgrade to a (internal) version (BK 4.14.x uses rocksdb 6.16.4) with change bringing the use of range deletion w/rocksDB #3653
After some research I have a gut feeling that the problem is related to fix of "a bug in iterator refresh which could segfault for DeleteRange users" facebook/rocksdb#10739
This should be included into RocksDB 7.8.0, I do not see it in 6.x versions. Instead i see 6.29.0 has "Added API warning against using Iterator::Refresh() together with DB::DeleteRange(), which are incompatible and have always risked causing the refreshed iterator to return incorrect results."
With that said, we have the following options:
To Reproduce
cannot repro
Expected behavior
no segfault
The text was updated successfully, but these errors were encountered: