Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Jun 23, 2017

What changes were proposed in this pull request?

If the SQL conf for StateStore provider class is changed between restarts (i.e. query started with providerClass1 and attempted to restart using providerClass2), then the query will fail in a unpredictable way as files saved by one provider class cannot be used by the newer one.

Ideally, the provider class used to start the query should be used to restart the query, and the configuration in the session where it is being restarted should be ignored.

This PR saves the provider class config to OffsetSeqLog, in the same way # shuffle partitions is saved and recovered.

How was this patch tested?

new unit tests

@tdas tdas changed the title [SPARK-21192][SS] Preserve State Store provider class configuration across restarts [SPARK-21192][SS] Preserve State Store provider class configuration across query restarts Jun 23, 2017
@tdas tdas changed the title [SPARK-21192][SS] Preserve State Store provider class configuration across query restarts [SPARK-21192][SS] Preserve State Store provider class configuration across StreamingQuery restarts Jun 23, 2017
@SparkQA
Copy link

SparkQA commented Jun 23, 2017

Test build #78520 has finished for PR 18402 at commit 0255e5d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Jun 23, 2017

LGTM. Thanks! Merging to master.

@asfgit asfgit closed this in 2ebd083 Jun 23, 2017
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
…cross StreamingQuery restarts

## What changes were proposed in this pull request?

If the SQL conf for StateStore provider class is changed between restarts (i.e. query started with providerClass1 and attempted to restart using providerClass2), then the query will fail in a unpredictable way as files saved by one provider class cannot be used by the newer one.

Ideally, the provider class used to start the query should be used to restart the query, and the configuration in the session where it is being restarted should be ignored.

This PR saves the provider class config to OffsetSeqLog, in the same way # shuffle partitions is saved and recovered.

## How was this patch tested?
new unit tests

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes apache#18402 from tdas/SPARK-21192.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants