Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Feb 16, 2017

What changes were proposed in this pull request?

The streaming thread in StreamExecution uses the following ways to check if it should exit:

  • Catch an InterruptException.
  • StreamExecution.state is TERMINATED.

When starting and stopping a query quickly, the above two checks may both fail:

  • Hit HADOOP-14084 and swallow InterruptException
  • StreamExecution.stop is called before state becomes ACTIVE. Then runBatches changes the state from TERMINATED to ACTIVE.

If the above cases both happen, the query will hang forever.

This PR changes state to AtomicReference and usescompareAndSet to make sure we only change the state from INITIALIZING to ACTIVE. It also removes the runUninterruptibly hack from ``HDFSMetadata`, because HADOOP-14084 won't cause any problem after we fix the race condition.

How was this patch tested?

Jenkins

@zsxwing zsxwing changed the title [SPARK-19617][SS]Don't interrupt 'mkdirs' to workaround HADOOP-14084 [SPARK-19617][SS][WIP]Don't interrupt 'mkdirs' to workaround HADOOP-14084 Feb 16, 2017
@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #72972 has finished for PR 16947 at commit d52ac13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #72988 has started for PR 16947 at commit fb27a97.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 16, 2017

retest this please

@zsxwing zsxwing changed the title [SPARK-19617][SS][WIP]Don't interrupt 'mkdirs' to workaround HADOOP-14084 [SPARK-19617][SS]Fix the race condition when starting and stopping a query quickly Feb 16, 2017
logDebug(s"Stream running from $committedOffsets to $availableOffsets")
} else {
constructNextBatch()
if (state.compareAndSet(INITIALIZING, ACTIVE)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes here are space changes. You can use https://github.com/apache/spark/pull/16947/files?w=1 to review it.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 16, 2017

It also removes the runUninterruptibly hack from ``HDFSMetadata`

I will submit a backport PR for 2.1 to not include this change because this is needed for 2.1 due to HADOOP-10622 (Master only support Hadoop 2.6+, which already fixed HADOOP-10622).

@SparkQA
Copy link

SparkQA commented Feb 16, 2017

Test build #3575 has finished for PR 16947 at commit 7317b0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// the query fast.
writeBatch(batchId, metadata)
}
writeBatch(batchId, metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt we disable interrupt because with local files, hadoop used shell commands to do file manipulation which could hang when interrupted? Are we removing this now because that has been fixed in hadoop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we removing this now because that has been fixed in hadoop?

Yes. We dropped the support to Hadoop 2.5 and earlier versions.

})
updateStatusMessage("Stopped")
} else {
// `stop()` is already called. Let `finally` finish the rest work.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish the cleanup

@tdas
Copy link
Contributor

tdas commented Feb 17, 2017

minor grammar issue in the comment, otherwise LGTM.

@tdas
Copy link
Contributor

tdas commented Feb 18, 2017

LGTM. Merge when tests finish to master and 2.1

@zsxwing
Copy link
Member Author

zsxwing commented Feb 18, 2017

@tdas we need another PR for 2.1 since this PR assumes Hadoop 2.6+. I'm doing it now.

@SparkQA
Copy link

SparkQA commented Feb 18, 2017

Test build #73078 has finished for PR 16947 at commit 13f76f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 18, 2017

Thanks! Merging to master.

@zsxwing
Copy link
Member Author

zsxwing commented Feb 18, 2017

#16979 is the backport for branch-2.1.

asfgit pushed a commit that referenced this pull request Feb 22, 2017
… query quickly (branch-2.1)

## What changes were proposed in this pull request?

Backport #16947 to branch 2.1. Note: we still need to support old Hadoop versions in 2.1.*.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16979 from zsxwing/SPARK-19617-branch-2.1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants