Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Jul 21, 2016

What changes were proposed in this pull request?

Current fix for deadlock disables interrupts in the StreamExecution which getting offsets for all sources, and when writing to any metadata log, to avoid potential deadlocks in HDFSMetadataLog(see JIRA for more details). However, disabling interrupts can have unintended consequences in other sources. So I am making the fix more narrow, by disabling interrupt it only in the HDFSMetadataLog. This is a narrower fix for something risky like disabling interrupt.

How was this patch tested?

Existing tests.

@tdas
Copy link
Contributor Author

tdas commented Jul 21, 2016

@marmbrus @zsxwing Can you take a look.

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62645 has finished for PR 14292 at commit 7a3e3fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Jul 21, 2016

test this

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62644 has finished for PR 14292 at commit d64e0c1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Jul 21, 2016

@tdas this change breaks the tests as they don't run in UninterruptibleThread

@tdas
Copy link
Contributor Author

tdas commented Jul 21, 2016

Fixing it.

@zsxwing
Copy link
Member

zsxwing commented Jul 21, 2016

LGTM. Pending tests.

@SparkQA
Copy link

SparkQA commented Jul 21, 2016

Test build #62687 has finished for PR 14292 at commit 0e67e26.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2016

Test build #3189 has finished for PR 14292 at commit 0e67e26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* potential dead-lock in Hadoop "Shell.runCommand" before 2.5.0 (HADOOP-10622). If the thread
* running "Shell.runCommand" is interrupted, then the thread can get deadlocked. In our
* case, `writeBatch` creates a file using HDFS API and calls "Shell.runCommand" to set the
* file permissions, and can get deadlocked is the stream execution thread is stopped by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #62835 has finished for PR 14292 at commit 26138b2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 25, 2016

Test build #3190 has finished for PR 14292 at commit 26138b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Jul 25, 2016

Tests have passed. Merging this to master and 2.0. Thanks for reviewing @zsxwing @jaceklaskowski

asfgit pushed a commit that referenced this pull request Jul 25, 2016
…locks in HDFSMetadataLog

## What changes were proposed in this pull request?
Current fix for deadlock disables interrupts in the StreamExecution which getting offsets for all sources, and when writing to any metadata log, to avoid potential deadlocks in HDFSMetadataLog(see JIRA for more details). However, disabling interrupts can have unintended consequences in other sources. So I am making the fix more narrow, by disabling interrupt it only in the HDFSMetadataLog. This is a narrower fix for something risky like disabling interrupt.

## How was this patch tested?
Existing tests.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #14292 from tdas/SPARK-14131.

(cherry picked from commit c979c8b)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in c979c8b Jul 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants