Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storages: Make DMFile ready for new column indexes/types #8756

Merged
merged 15 commits into from
Feb 22, 2024

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Feb 5, 2024

What problem does this PR solve?

Issue Number: ref #6233, close #8768

Problem Summary:

In the near future, we may add new types (vector type) or new indexes. But currently the ColumnStat is not flexible enough for extending new fields for the index size, array size, etc.

What is changed and how it works?

  • Add new type ExtendColumnStat meta block under DMFile meta v2, which accept protobuf base ColumnStat
  • Add array_sizes_bytes/array_sizes_mark_bytes for ColumnStat to support vec type
  • Try to merge array_size(.size0.dat) into X.merged file

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 5, 2024
@JaySon-Huang JaySon-Huang force-pushed the refactor_dmfile_1 branch 2 times, most recently from 5d665a7 to 683de5a Compare February 5, 2024 08:47
@ti-chi-bot ti-chi-bot bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 5, 2024
@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2024
Copy link
Contributor

ti-chi-bot bot commented Feb 6, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JaySon-Huang JaySon-Huang force-pushed the refactor_dmfile_1 branch 2 times, most recently from e19a5ea to 7e3b18c Compare February 6, 2024 10:50
@JaySon-Huang JaySon-Huang changed the title [DNM] Storages: Ready for new column indexes/types Storages: Make DMFile ready for new column indexes/types Feb 6, 2024
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2024
@JaySon-Huang
Copy link
Contributor Author

/run-unit-test

#else
// ExtendColumnStat is not enabled yet because it cause downgrade compatibility, wait
// to be released with other binary format changes.
writeExtendColumnStatToBuffer(tmp_buffer),
Copy link
Member

@breezewish breezewish Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we write the new format data at the end (and also write old format in the old place)? In this way it may be possible to keep downgrade compatibility as new format will be regarded as extra data and will be discarded.

Copy link
Contributor Author

@JaySon-Huang JaySon-Huang Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! The ColumnStat is small enough to ignore its impact on performance and data size. We can keep it until we actually need an incompatible storage format changed by other components.
I'll change it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some tests, I found the downgrade compatibility can not be satisfied because there is a default branch 😂

default:
throw Exception(
ErrorCodes::INCORRECT_DATA,
"MetaBlockType {} is not recognized",
magic_enum::enum_name(handle->type));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just logging instread of throwing exception here.

Copy link
Member

@breezewish breezewish Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. What kind of downgrade compatibility do we need to have? How about we first land the code that will not throw error (and will not change the format at all) for a few major versions, and then land the compatible format? In this way we will have downgrade compatibility for 1 major version. I think Vector will not land pingcap/tiflash in 6 months, so that there could be enough time for us to make the change.

Copy link
Contributor Author

@JaySon-Huang JaySon-Huang Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only downgrade compatibility requirement between patch versions under the same minor version. And no requirement across minor or major versions now. But it is good to have.

I've removed the exception thrown branch in this PR because the correctness of bytes read is protected by checksum. I think ignoring unknown meta block types without exception or logging is an acceptable behavior.

@JinheLin @breezewish

@breezewish breezewish self-assigned this Feb 13, 2024
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Feb 18, 2024
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 18, 2024
Copy link
Contributor

ti-chi-bot bot commented Feb 18, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-02-18 02:34:42.321961049 +0000 UTC m=+151771.069584160: ☑️ agreed by CalvinNeo.
  • 2024-02-18 02:46:10.763367893 +0000 UTC m=+152459.510991004: ☑️ agreed by Lloyd-Pottiger.

@Lloyd-Pottiger
Copy link
Contributor

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 18, 2024
Copy link
Contributor

ti-chi-bot bot commented Feb 21, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, JinheLin, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CalvinNeo,JinheLin,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JaySon-Huang
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2024
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-integration-test

@JaySon-Huang
Copy link
Contributor Author

/run-unit-test

Copy link
Contributor

ti-chi-bot bot commented Feb 22, 2024

@JaySon-Huang: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

trigger some heavy tests which will not run always when PR updated.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit b0c365c into pingcap:master Feb 22, 2024
6 checks passed
@JaySon-Huang JaySon-Huang deleted the refactor_dmfile_1 branch February 23, 2024 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failure tests of DeltaMergeStore
5 participants