Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: switch table meta encoding scheme to msgpack #11592

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented May 26, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

bincode is excellent, except that it does not support schema evolution, which is an essential feature for us.

  • switch table meta encoding format from bincode to msgpack

  • introduce table meta version 4

    • freeze table-meta related types in table meta v3

      e.g. TableSchema, BlockMeta, Statistics, ColumnMeta, etc. in mod v3::frozen, to make the upcoming meta type evolutions easier. non-frozen corresponding types could be "upgraded" as long as serde can handle the backward compatibility issues, without bumping table meta version.

      but not all the types are (or easy to be) frozen in v3, e.g. the Scalar.

  • backward compatibility test for releases 1.1.30, 1.1.38, 1.1.39, 1.1.46

    the test scripts need to be refined (to avoid duplications of test cases), will be addressed in dedicated pr.

for reviewers:

tasks:

  • evaluating pot and msgpack(named)
    let's switch to msgpack
  • bump table meta version to V4
  • add fuse-compat test

evaluating msgpack(named)


benchmark :

https://github.com/datafuselabs/databend/blob/61ba1a9ff8a819d2cc0f71da60f131794c2e147e/src/query/storages/common/table-meta/benches/bench.rs

local result: cargo bench -p storages-common-table-meta

Benchmarking decoding/bincode-decode-block-metas: Collecting 100 samples in estimated 5.6910 s (700 iteratdecoding/bincode-decode-block-metas
                        time:   [8.2057 ms 8.2859 ms 8.3715 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
Benchmarking decoding/msg-pack-decode-block-metas: Collecting 100 samples in estimated 6.5603 s (400 iteradecoding/msg-pack-decode-block-metas
                        time:   [17.628 ms 18.579 ms 19.624 ms]
Found 18 outliers among 100 measurements (18.00%)
  18 (18.00%) high severe
Benchmarking decoding/bincode-segment-deserialization: Collecting 100 samples in estimated 5.1036 s (600 idecoding/bincode-segment-deserialization
                        time:   [8.5979 ms 8.6988 ms 8.8083 ms]
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe
Benchmarking decoding/msg-pack-segment-deserialization: Collecting 100 samples in estimated 6.5793 s (400 decoding/msg-pack-segment-deserialization
                        time:   [16.770 ms 16.923 ms 17.087 ms]
Found 15 outliers among 100 measurements (15.00%)
  14 (14.00%) high mild
  1 (1.00%) high severe
----------------------------------
segment_bincode_bytes: 742
segment_msgpack_bytes: 1121
segment_msgpack_bytes / segment_bincode_bytes: 1.5107816711590296

Closes #issue

@vercel
Copy link

vercel bot commented May 26, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Jun 2, 2023 2:32am

@mergify mergify bot added the pr-refactor this PR changes the code base without new features or bugfix label May 26, 2023
@dantengsky dantengsky force-pushed the refactor-switch-table-meta-encoding-scheme branch 3 times, most recently from 06a439b to 0c3b9f2 Compare May 31, 2023 07:32
@dantengsky dantengsky changed the title refactor(WIP): switch table meta encoding scheme refactor: switch table meta encoding scheme to msgpack May 31, 2023
@dantengsky dantengsky added the ci-benchmark Benchmark: run all test label May 31, 2023
@dantengsky dantengsky marked this pull request as ready for review June 1, 2023 00:36
@dantengsky dantengsky force-pushed the refactor-switch-table-meta-encoding-scheme branch from b6021c7 to 2155ab0 Compare June 1, 2023 08:14
@sundy-li sundy-li removed the ci-benchmark Benchmark: run all test label Jun 1, 2023
@BohuTANG BohuTANG merged commit abb5120 into databendlabs:main Jun 2, 2023
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
…s#11592)

* switch table meta encoding to msgpack

* make `v4::Segment::to_bytes_with_encoding` private

* refactor: gate the benchmark with feature `dev`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants