feat(storage): separate storage databases experimentally #410

ijsong · 2023-04-11T03:46:31Z

What this PR does

This PR implemented the idea of separating the storage database, which was proposed by @hungryjang.
It divides storage databases into two parts: data database and commit database. The data part
stores records whose keys are LLSNs and whose values are log data users append. The commit part
stores records whose keys are GLSNs and whose values are LLSNs.

This approach is very performant compared to the previous one. Especially the data part can take
advantage of move compaction dramatically. Empirically throughput can be increased by about 10-20%
and append duration reduced by about 10-20%.
However, it doubles the number of pebble instances. We should configure the storage databases
carefully to overcome this.

ijsong · 2023-04-11T03:46:40Z

Current dependencies on/for this PR:

main
- PR ci(make): increase test timeout and reduce parallelism #409
  - PR perf(storagenode): check log level #411
    - PR perf(storagenode): remove backup from append request #412
      - PR refactor(storagenode): simplify unary RPC logging #413
        
        PR perf(storagenode): estimate the number of batchlets #414
  - PR feat(storage): separate storage databases experimentally #410 👈

This comment was auto-generated by Graphite.

codecov-commenter · 2023-04-11T04:14:22Z

Codecov Report

Patch coverage: 50.72% and project coverage change: +0.16 🎉

Comparison is base (6ccf9ec) 63.60% compared to head (e3fa2f4) 63.76%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@               Coverage Diff                @@
##           test_options     #410      +/-   ##
================================================
+ Coverage         63.60%   63.76%   +0.16%     
================================================
  Files               131      131              
  Lines             17801    17918     +117     
================================================
+ Hits              11322    11426     +104     
- Misses             5937     5949      +12     
- Partials            542      543       +1

Impacted Files	Coverage Δ
internal/storage/testing.go	`18.60% <0.00%> (-0.91%)`	⬇️
internal/storage/config.go	`51.33% <18.66%> (-12.96%)`	⬇️
internal/storage/append_batch.go	`76.92% <61.53%> (-5.43%)`	⬇️
internal/storage/storage.go	`87.45% <74.74%> (+9.52%)`	⬆️
internal/storage/recovery_points.go	`90.69% <100.00%> (ø)`
internal/storage/scanner.go	`98.71% <100.00%> (ø)`
internal/storagenode/storagenode.go	`73.78% <100.00%> (+0.84%)`	⬆️
internal/storagenode/volume/volume.go	`94.38% <100.00%> (ø)`

... and 7 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hungryjang

This PR implemented the idea of separating the storage database, which was proposed by @hungryjang. It divides storage databases into two parts: data database and commit database. The data part stores records whose keys are LLSNs and whose values are log data users append. The commit part stores records whose keys are GLSNs and whose values are LLSNs. This approach is very performant compared to the previous one. Especially the data part can take advantage of move compaction dramatically. Empirically throughput can be increased by about 10-20% and append duration reduced by about 10-20%. However, it doubles the number of pebble instances. We should configure the storage databases carefully to overcome this.

internal/storage/append_batch.go

ijsong requested a review from hungryjang as a code owner April 11, 2023 03:46

ijsong mentioned this pull request Apr 11, 2023

ci(make): increase test timeout and reduce parallelism #409

Merged

ijsong self-assigned this Apr 11, 2023

ijsong force-pushed the separate_db branch from 20c3557 to e3fa2f4 Compare April 11, 2023 03:59

This was referenced Apr 11, 2023

perf(storagenode): check log level #411

Merged

perf(storagenode): remove backup from append request #412

Merged

refactor(storagenode): simplify unary RPC logging #413

Merged

perf(storagenode): estimate the number of batchlets #414

Merged

ijsong force-pushed the test_options branch from 6ccf9ec to 7c81800 Compare April 13, 2023 13:22

ijsong force-pushed the separate_db branch from e3fa2f4 to 4788da6 Compare April 13, 2023 13:22

Base automatically changed from test_options to main April 13, 2023 13:41

ijsong force-pushed the separate_db branch from 4788da6 to b3845f5 Compare April 13, 2023 13:44

hungryjang reviewed Apr 14, 2023

View reviewed changes

internal/storage/append_batch.go Show resolved Hide resolved

hungryjang approved these changes Apr 14, 2023

View reviewed changes

ijsong merged commit 9f64785 into main Apr 14, 2023

ijsong deleted the separate_db branch April 14, 2023 05:28

ijsong mentioned this pull request Apr 14, 2023

chore(main): release 0.13.0 #404

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): separate storage databases experimentally #410

feat(storage): separate storage databases experimentally #410

ijsong commented Apr 11, 2023

ijsong commented Apr 11, 2023 •

edited

Loading

codecov-commenter commented Apr 11, 2023 •

edited

Loading

feat(storage): separate storage databases experimentally #410

feat(storage): separate storage databases experimentally #410

Conversation

ijsong commented Apr 11, 2023

What this PR does

ijsong commented Apr 11, 2023 • edited Loading

codecov-commenter commented Apr 11, 2023 • edited Loading

Codecov Report

ijsong commented Apr 11, 2023 •

edited

Loading

codecov-commenter commented Apr 11, 2023 •

edited

Loading