Skip to content

Conversation

@srowen
Copy link
Member

@srowen srowen commented Jul 6, 2016

What changes were proposed in this pull request?

The commit 044971e introduced a lazy val to simplify code in Logging. Simple enough, though one side effect is that accessing log now means grabbing the instance's lock. This in turn turned up a form of deadlock in the Mesos code. It was arguably a bit of a problem in how this code is structured, but, in any event the safest thing to do seems to be to revert the commit, and that's 90% of the change here; it's just not worth the risk of similar more subtle issues.

What I didn't revert here was the removal of this odd override of log in the Mesos code. In retrospect it might have been put in place at some stage as a defense against this type of problem. After all the Logging code still involved a lock at initialization before the change in question.

Even after the revert, it doesn't seem like it does anything, given how Logging works now, so I left it removed. However, I also removed the particular log message that ended up playing a part in this problem anyway, maybe being paranoid, to make sure this type of problem can't happen even with how the current locking works in logging initialization.

How was this patch tested?

Jenkins tests

@SparkQA
Copy link

SparkQA commented Jul 6, 2016

Test build #61845 has finished for PR 14069 at commit 8701b6c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 6, 2016

The JIRA thread is getting long. Can you put in the pull request description more concisely the problem and why this fixes it?

@srowen
Copy link
Member Author

srowen commented Jul 6, 2016

Sure, the commit 044971e introduced a lazy val to simplify code in Logging. Simple enough, though one side effect is that accessing log now means grabbing the instance's lock. This in turn turned up a form of deadlock in the Mesos code. It was arguably a bit of a problem in how this code is structured, but, in any event the safest thing to do seems to be to revert the commit, and that's 90% of the change here; it's just not worth the risk of similar more subtle issues.

What I didn't revert here was the removal of this odd override of log in the Mesos code. In retrospect it might have been put in place at some stage as a defense against this type of problem. After all the Logging code still involved a lock at initialization before the change in question.

Even after the revert, it doesn't seem like it does anything, given how Logging works now, so I left it removed. However, I also removed the particular log message that ended up playing a part in this problem anyway, maybe being paranoid, to make sure this type of problem can't happen even with how the current locking works in logging initialization.

@rxin
Copy link
Contributor

rxin commented Jul 6, 2016

Thanks - can you put this in the pr description itself so it becomes part of the commit?

@rxin
Copy link
Contributor

rxin commented Jul 6, 2016

Merging in master/2.0.

@asfgit asfgit closed this in a8f89df Jul 6, 2016
asfgit pushed a commit that referenced this pull request Jul 6, 2016
…tion in Logging

## What changes were proposed in this pull request?

The commit 044971e introduced a lazy val to simplify code in Logging. Simple enough, though one side effect is that accessing log now means grabbing the instance's lock. This in turn turned up a form of deadlock in the Mesos code. It was arguably a bit of a problem in how this code is structured, but, in any event the safest thing to do seems to be to revert the commit, and that's 90% of the change here; it's just not worth the risk of similar more subtle issues.

What I didn't revert here was the removal of this odd override of log in the Mesos code. In retrospect it might have been put in place at some stage as a defense against this type of problem. After all the Logging code still involved a lock at initialization before the change in question.

Even after the revert, it doesn't seem like it does anything, given how Logging works now, so I left it removed. However, I also removed the particular log message that ended up playing a part in this problem anyway, maybe being paranoid, to make sure this type of problem can't happen even with how the current locking works in logging initialization.

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #14069 from srowen/SPARK-16379.

(cherry picked from commit a8f89df)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@srowen srowen deleted the SPARK-16379 branch July 7, 2016 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants