Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(streaming): source executor use table_id for keyspace prefix #3010

Merged
merged 1 commit into from
Jun 7, 2022

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Jun 6, 2022

What's changed and what's your intention?

source executor use table_id for keyspace prefix

  • use keyspace::table_root to replace keyspace::executor_root same as materialized_executor

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 6, 2022

Good job! We can finally remove executor_root and shared_executor_root in Keyspace.

@codecov
Copy link

codecov bot commented Jun 6, 2022

Codecov Report

Merging #3010 (7d76625) into main (043beb7) will not change coverage.
The diff coverage is 0.00%.

@@           Coverage Diff           @@
##             main    #3010   +/-   ##
=======================================
  Coverage   73.11%   73.11%           
=======================================
  Files         724      724           
  Lines       97174    97174           
=======================================
  Hits        71049    71049           
  Misses      26125    26125           
Flag Coverage Δ
rust 73.11% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/stream/src/from_proto/source.rs 0.00% <0.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to hold this PR for a while. Previously, each source has its own keyspace. But now, we make them share one keyspace, which would cause problems.

The correct implementation should be like

let keyspace = Keyspace::table_root(store, &source_id).append(&executor_id.to_le_bytes());

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 6, 2022

Need to hold this PR for a while. Previously, each source has its own keyspace. But now, we make them share one keyspace, which would cause problems.

The correct implementation should be like

let keyspace = Keyspace::table_root(store, &source_id).append(&executor_id.to_le_bytes());

Why would it cause problems? I thought each source split of the same MV should use different user keys to store their state.

@skyzh
Copy link
Contributor

skyzh commented Jun 6, 2022

Why would it cause problems? I thought each source split of the same MV should use different user keys to store their state.

Did we ensure that explicitly? Previously source executor is using executor_root, which is per-executor instead of per-operator (like hash join and hash agg).

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 6, 2022

Why would it cause problems? I thought each source split of the same MV should use different user keys to store their state.

Did we ensure that explicitly? Previously source executor is using executor_root, which is per-executor instead of per-operator (like hash join and hash agg).

@tabVersion can confirm.

I skim through the current implementation and it seems that the split implementations of the supported source already ensure that:

SplitImpl will put split id or partition id in the return value of SplitMetaData::id(), which is used as the source state key. (I originally thought PulsarSplit has a wrong implementation since it only put topic in SplitMetaData::id() but I realized that the topic name is actually a sub-topic name with partition id as the suffix if we use partitioned topic in pulsar.)

@skyzh
Copy link
Contributor

skyzh commented Jun 6, 2022

That's cool. So we can use shared_executor_root for source already? Then I'd approve this PR.

@Li0k Li0k merged commit 87b9397 into main Jun 7, 2022
@Li0k Li0k deleted the li0k/source_exector_keyspace_use_table_id branch June 7, 2022 02:43
@Li0k Li0k restored the li0k/source_exector_keyspace_use_table_id branch June 7, 2022 02:47
@tabVersion
Copy link
Contributor

Did we ensure that explicitly? Previously source executor is using executor_root, which is per-executor instead of per-operator (like hash join and hash agg).

thanks for fixing the misuse of executor_id. source split's assignment mechanism guarantees one split can only be assigned to one executor so the prefix is unique in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants