Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49656][SS] Add support for state variables with value state collection types and read change feed options #48148

Closed
wants to merge 11 commits into from

Conversation

anishshri-db
Copy link
Contributor

@anishshri-db anishshri-db commented Sep 18, 2024

What changes were proposed in this pull request?

Add support for state variables with value state collection types and read change feed options

Why are the changes needed?

Without this, we cannot support reading per key changes for state variables used with stateful processors.

Does this PR introduce any user-facing change?

Yes

Users can now query value state variables with the following query:

        val changeFeedDf = spark.read
            .format("statestore")
            .option(StateSourceOptions.PATH, <checkpoint_loc>)
            .option(StateSourceOptions.STATE_VAR_NAME, <state_var_name>)
            .option(StateSourceOptions.READ_CHANGE_FEED, true)
            .option(StateSourceOptions.CHANGE_START_BATCH_ID, 0)
            .load()

How was this patch tested?

Added unit tests

[info] Run completed in 17 seconds, 318 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db changed the title [SPARK-49656] Add support for state variables with value state collection types and read change feed options [SPARK-49656][SS] Add support for state variables with value state collection types and read change feed options Sep 18, 2024
@anishshri-db anishshri-db marked this pull request as draft September 18, 2024 18:01
@anishshri-db anishshri-db marked this pull request as ready for review September 18, 2024 23:16
@anishshri-db
Copy link
Contributor Author

cc - @jingz-db @HeartSaVioR - PTAL, thx !

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Mostly style and suggestion for refactor.

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…llection types and read change feed options

### What changes were proposed in this pull request?
Add support for state variables with value state collection types and read change feed options

### Why are the changes needed?
Without this, we cannot support reading per key changes for state variables used with stateful processors.

### Does this PR introduce _any_ user-facing change?
Yes

Users can now query value state variables with the following query:

```
        val changeFeedDf = spark.read
            .format("statestore")
            .option(StateSourceOptions.PATH, <checkpoint_loc>)
            .option(StateSourceOptions.STATE_VAR_NAME, <state_var_name>)
            .option(StateSourceOptions.READ_CHANGE_FEED, true)
            .option(StateSourceOptions.CHANGE_START_BATCH_ID, 0)
            .load()
```

### How was this patch tested?
Added unit tests

```
[info] Run completed in 17 seconds, 318 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48148 from anishshri-db/task/SPARK-49656.

Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…llection types and read change feed options

### What changes were proposed in this pull request?
Add support for state variables with value state collection types and read change feed options

### Why are the changes needed?
Without this, we cannot support reading per key changes for state variables used with stateful processors.

### Does this PR introduce _any_ user-facing change?
Yes

Users can now query value state variables with the following query:

```
        val changeFeedDf = spark.read
            .format("statestore")
            .option(StateSourceOptions.PATH, <checkpoint_loc>)
            .option(StateSourceOptions.STATE_VAR_NAME, <state_var_name>)
            .option(StateSourceOptions.READ_CHANGE_FEED, true)
            .option(StateSourceOptions.CHANGE_START_BATCH_ID, 0)
            .load()
```

### How was this patch tested?
Added unit tests

```
[info] Run completed in 17 seconds, 318 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48148 from anishshri-db/task/SPARK-49656.

Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants