Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: apply column-aware row encoding & enable schema change #8394

Merged
merged 14 commits into from
Mar 9, 2023

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Mar 7, 2023

Signed-off-by: Bugen Zhao i@bugenzhao.comI hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Enable column-aware row encoding for MaterializeExecutor and all StorageTable usages.

  • StateTable will decide which encoding format is used according to whether version field is set. Note that this is only set for regular tables.
  • Add a versioned field to TableDesc. StorageTable will similarly decide the format by this.
  • Enable ALTER TABLE [ADD|DROP] COLUMN in production.

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • SQL commands, functions, and operators
    • ALTER TABLE [ADD|DROP] COLUMN

Release note

Initial support for ALTER TABLE [ADD|DROP] COLUMN on regular tables (without connectors).

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao changed the title feat: apply column-aware row encoding for table feat: apply column-aware row encoding for table & enable schema change Mar 7, 2023
@BugenZhao BugenZhao changed the title feat: apply column-aware row encoding for table & enable schema change feat: apply column-aware row encoding & enable schema change Mar 7, 2023
@BugenZhao BugenZhao added the user-facing-changes Contains changes that are visible to users label Mar 7, 2023
…ncoding-for-table

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao BugenZhao requested review from wcy-fdu, y-wei and st1page March 7, 2023 10:36
Comment on lines +84 to +90
// TODO: If we do some `Delete` after schema change, we cannot ensure the encoded value
// with the new version of serializer is the same as the old one, even if they can be
// decoded into the same value. The table is now performing consistency check on the raw
// bytes, so we need to turn off the check here. We may turn it on if we can compare the
// decoded row.
let state_table =
StateTableInner::from_table_catalog_inconsistent_op(table_catalog, store, vnodes).await;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @hzxa21 😰

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc. @hzxa21 Sanity check is disabled here

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@y-wei
Copy link
Contributor

y-wei commented Mar 7, 2023

Do existing e2e/unit tests cover schema change cases?

@BugenZhao
Copy link
Member Author

Do existing e2e/unit tests cover schema change cases?

Oops, I forgot. I'll add some.

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@codecov
Copy link

codecov bot commented Mar 8, 2023

Codecov Report

Merging #8394 (5924af6) into main (e6f2561) will decrease coverage by 0.02%.
The diff coverage is 56.63%.

@@            Coverage Diff             @@
##             main    #8394      +/-   ##
==========================================
- Coverage   71.62%   71.61%   -0.02%     
==========================================
  Files        1135     1135              
  Lines      185458   185541      +83     
==========================================
+ Hits       132835   132875      +40     
- Misses      52623    52666      +43     
Flag Coverage Δ
rust 71.61% <56.63%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...batch/src/executor/join/distributed_lookup_join.rs 0.00% <0.00%> (ø)
src/batch/src/executor/row_seq_scan.rs 21.80% <0.00%> (-0.21%) ⬇️
src/common/src/row/compacted_row.rs 64.28% <0.00%> (-35.72%) ⬇️
...c/util/value_encoding/column_aware_row_encoding.rs 78.02% <0.00%> (-3.29%) ⬇️
src/ctl/src/cmd_impl/table/scan.rs 0.00% <0.00%> (ø)
src/java_binding/src/iterator.rs 0.00% <0.00%> (ø)
src/meta/src/manager/streaming_job.rs 20.40% <0.00%> (-1.10%) ⬇️
src/stream/src/from_proto/batch_query.rs 0.00% <0.00%> (ø)
src/stream/src/from_proto/chain.rs 0.00% <0.00%> (ø)
src/stream/src/from_proto/lookup.rs 0.00% <0.00%> (ø)
... and 18 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@BugenZhao BugenZhao requested a review from chenzl25 March 8, 2023 08:29
@BugenZhao
Copy link
Member Author

Do existing e2e/unit tests cover schema change cases?

Done. PTAL 🥺

}

pub async fn handlle_conflict<'a, S: StateStore>(
&mut self,
buffer: MaterializeBuffer,
table: &StateTable<S>,
table: &StateTableInner<S, SD>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract a trait for the state table such as StateTableRead and change the type to impl StateTableRead to erase the S and SD here 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea. We can do this refactor in the future.

src/stream/src/common/table/state_table.rs Outdated Show resolved Hide resolved
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

pub mod column_aware_row_encoding;

pub type Result<T> = std::result::Result<T, ValueEncodingError>;

/// The kind of all possible `ValueRowSerde`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run fast enough, perhaps we could remove the Basic encoding in 0.18. (Because the storage's side work has break the compatibility anyway. Of course just for this time 🥵)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of still using the basic value-encoding for the tables that (temporarily) not supports schema change, like internal tables or normal materialized views (except for Index). The reason is that "column ID" does not make much sense for these tables with immutable schema, and we can save more storage costs.

@@ -41,6 +41,9 @@ message StorageTableDesc {
uint32 retention_seconds = 5;
repeated uint32 value_indices = 6;
uint32 read_prefix_len_hint = 7;
// Whether the table is versioned. If `true`, column-aware row encoding will be used
// to be compatible with schema changes.
bool versioned = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that a table contains both basic encoded and column-aware encoded rows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't consider backward compatibility, then nope. Currently, we follow the version field of TableCatalog to decide the encoding.

Copy link
Contributor

@y-wei y-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any concerns, but I may be better to have another senior's approval 😀

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot removed the user-facing-changes Contains changes that are visible to users label Mar 9, 2023
@BugenZhao BugenZhao added this pull request to the merge queue Mar 9, 2023
@github-actions github-actions bot added the user-facing-changes Contains changes that are visible to users label Mar 9, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 9, 2023
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Comment on lines +182 to +187
// Decide which serializer to use based on whether the table is versioned or not.
let row_serde = if versioned {
ColumnAwareSerde::new(&column_ids, schema.into()).into()
} else {
BasicSerde::new(&column_ids, schema.into()).into()
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found that there's something else we need to update in the storage java binding. cc @wenym1 Is it possible to reuse some code of StorageTable here?

@BugenZhao BugenZhao requested a review from wenym1 March 9, 2023 06:10
@BugenZhao BugenZhao added this pull request to the merge queue Mar 9, 2023
Merged via the queue into main with commit 630685f Mar 9, 2023
@BugenZhao BugenZhao deleted the bz/column-aware-row-encoding-for-table branch March 9, 2023 08:32
@CharlieSYH CharlieSYH added 📖✓ Covered or will be covered in the user docs. 📖✗ No user documentation is needed. and removed 📖✓ Covered or will be covered in the user docs. labels Apr 17, 2023
@CharlieSYH CharlieSYH added 📖✓ Covered or will be covered in the user docs. and removed 📖✗ No user documentation is needed. labels May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change type/feature user-facing-changes Contains changes that are visible to users 📖✓ Covered or will be covered in the user docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants