Skip to content

Commit

Permalink
[rocksdb] increase write stopping threshold for L0 files (MystenLabs#…
Browse files Browse the repository at this point in the history
…18872)

## Description 

Currently for column families with high write rate, write stalling and
stopping can happen after there are 24 pending L0 files, for example in
consensus DB during consensus catchup. This hurts the throughput
significantly and reduces the stability of the system. The # of L0 files
to compact is reduced back to the default (4), to speed up L0
compactions. Also, for DB that optimizes for write throughput, the
thresholds to stall and stop writes are further increased.

Logs observed from one validator:
```
2024/07/31-03:59:57.695047 2702658 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 24 level-0 files rate 16777216
2024/07/31-04:00:13.393421 2702607 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 24 level-0 files rate 13421772
2024/07/31-04:00:13.393593 2702607 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 24 level-0 files rate 10737417
2024/07/31-04:00:14.418687 2702901 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 25 level-0 files rate 8589933
2024/07/31-04:00:43.754068 2702656 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 25 level-0 files rate 10737416
2024/07/31-04:00:52.471606 2702597 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 25 level-0 files rate 8589932
2024/07/31-04:00:52.471784 2702597 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 25 level-0 files rate 9620723
2024/07/31-04:00:53.677837 2702901 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 26 level-0 files rate 7696578
2024/07/31-04:01:26.237337 2702597 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 26 level-0 files rate 8620167
2024/07/31-04:01:26.237494 2702597 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 26 level-0 files rate 6896133
2024/07/31-04:01:27.389744 2702901 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 27 level-0 files rate 5516906
2024/07/31-04:02:21.401986 2702597 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 27 level-0 files rate 4413524
2024/07/31-04:02:21.402179 2702597 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 27 level-0 files rate 3530819
2024/07/31-04:02:22.441728 2702901 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 28 level-0 files rate 2118491
2024/07/31-04:03:18.346778 2702614 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 28 level-0 files rate 10066329
2024/07/31-04:03:18.346980 2702614 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 28 level-0 files rate 6039797
2024/07/31-04:03:19.198853 2702901 [WARN] [db/column_family.cc:991] [blocks] Stalling writes because we have 29 level-0 files rate 3623878
```

There is no logs for stopping writes at 30 level-0 files.

## Test plan 

CI. Private testnet.

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
  • Loading branch information
mwtian authored Aug 1, 2024
1 parent 5a0febd commit 32347c2
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions crates/typed-store/src/rocks/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ const DEFAULT_DB_WAL_SIZE: usize = 1024;

// Environment variable to control behavior of write throughput optimized tables.
const ENV_VAR_L0_NUM_FILES_COMPACTION_TRIGGER: &str = "L0_NUM_FILES_COMPACTION_TRIGGER";
const DEFAULT_L0_NUM_FILES_COMPACTION_TRIGGER: usize = 6;
const DEFAULT_L0_NUM_FILES_COMPACTION_TRIGGER: usize = 4;
const ENV_VAR_MAX_WRITE_BUFFER_SIZE_MB: &str = "MAX_WRITE_BUFFER_SIZE_MB";
const DEFAULT_MAX_WRITE_BUFFER_SIZE_MB: usize = 256;
const ENV_VAR_MAX_WRITE_BUFFER_NUMBER: &str = "MAX_WRITE_BUFFER_NUMBER";
Expand Down Expand Up @@ -2346,7 +2346,7 @@ impl DBOptions {
let target_file_size_base = 64 << 20;
self.options
.set_target_file_size_base(target_file_size_base);
// Level 1 default to 64MiB * 6 ~ 384MiB.
// Level 1 default to 64MiB * 4 ~ 256MiB.
let max_level_zero_file_num = read_size_from_env(ENV_VAR_L0_NUM_FILES_COMPACTION_TRIGGER)
.unwrap_or(DEFAULT_L0_NUM_FILES_COMPACTION_TRIGGER);
self.options
Expand Down Expand Up @@ -2395,10 +2395,10 @@ impl DBOptions {
max_level_zero_file_num.try_into().unwrap(),
);
self.options.set_level_zero_slowdown_writes_trigger(
(max_level_zero_file_num * 4).try_into().unwrap(),
(max_level_zero_file_num * 12).try_into().unwrap(),
);
self.options
.set_level_zero_stop_writes_trigger((max_level_zero_file_num * 5).try_into().unwrap());
.set_level_zero_stop_writes_trigger((max_level_zero_file_num * 16).try_into().unwrap());

// Increase sst file size to 128MiB.
self.options.set_target_file_size_base(
Expand Down

0 comments on commit 32347c2

Please sign in to comment.