Skip to content

Conversation

@ericm-db
Copy link
Contributor

What changes were proposed in this pull request?

Introducing the OperatorStateMetadataV2 format that integrates with the TransformWithStateExec operator. This is used to keep information about the TWS operator, will be used to enforce invariants in between query runs. Each OperatorStateMetadataV2 has a pointer to the StateSchemaV3 file for the corresponding operator.
Will introduce purging in this PR: #47286

Why are the changes needed?

This is needed for State Metadata integration with the TransformWithState operator.

Does this PR introduce any user-facing change?

How was this patch tested?

Added unit tests to StateStoreSuite and TransformWithStateSuite

Was this patch authored or co-authored using generative AI tooling?

No

@ericm-db ericm-db marked this pull request as ready for review July 22, 2024 16:01
@ericm-db ericm-db requested a review from anishshri-db July 23, 2024 03:42
@ericm-db
Copy link
Contributor Author

ericm-db commented Jul 23, 2024

@HeartSaVioR PTAL, thanks!

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Looks good in general.

* @param extraOptions - any extra options to be passed for StateStoreConf creation
* @param storeName - optional state store name
* @param schemaFilePath - optional schema file path
* @param oldSchemaFilePath - optional path to the old schema file. If not provided, will default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid for schema version 3 to have None? Otherwise let's also mention here as same as below; Needed for schema version 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't expect there to be a schema file if we are in the first run of a new query, right?

@ericm-db ericm-db requested review from HeartSaVioR July 24, 2024 21:27
@ericm-db
Copy link
Contributor Author

@HeartSaVioR addressed feedback, please take another look when you get a chance

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
…hStateExec operator

### What changes were proposed in this pull request?

Introducing the OperatorStateMetadataV2 format that integrates with the TransformWithStateExec operator. This is used to keep information about the TWS operator, will be used to enforce invariants in between query runs. Each OperatorStateMetadataV2 has a pointer to the StateSchemaV3 file for the corresponding operator.
Will introduce purging in this PR: apache#47286
### Why are the changes needed?

This is needed for State Metadata integration with the TransformWithState operator.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Added unit tests to StateStoreSuite and TransformWithStateSuite

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47445 from ericm-db/metadata-v2.

Authored-by: Eric Marnadi <eric.marnadi@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants