-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31422][CORE] Fix NPE when BlockManagerSource is used after BlockManagerMaster stops #28187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ckManagerMaster stops
| * amount of remaining memory. | ||
| */ | ||
| def getMemoryStatus: Map[BlockManagerId, (Long, Long)] = { | ||
| if (driverEndpoint == null) return Map.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although only getStorageStatus is used, I added the preventive code into getMemoryStatus together.
|
Hi, @maropu . |
|
Test build #121122 has finished for PR 28187 at commit
|
|
retest this please |
|
The fix itself is straight-forward and looks fine to me. Btw, (Just a question); couldn't we stop |
|
Thank you for reviewing and approval. Yes, it's possible, but the stop sequence has been fixed since |
|
|
||
| import org.apache.spark.{SparkConf, SparkFunSuite} | ||
|
|
||
| class BlockManagerMasterSuite extends SparkFunSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests to BlockManagerSuite should be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but BlockManagerSuite looks misleading to me because this is BlockManagerManger. If there is a more broader suite, it could be a candidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think it should be ok because we always test BlockManagerMaster together with BlockManager. And there're already some tests which test BlockManagerMaster only, e.g.
-
query locations of blockIds -
SPARK-30594: Do not post SparkListenerBlockUpdated when updateBlockInfo returns false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I don't think so. In your example, query locations of blockIds is just do mocking instead of testing.
test("query locations of blockIds") {
val mockBlockManagerMaster = mock(classOf[BlockManagerMaster])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, BlockManagerSuite is already large with 1805 lines. In general, we had better have a proper UT case.
core/src/test/scala/org/apache/spark/storage/BlockManagerMasterSuite.scala
Outdated
Show resolved
Hide resolved
|
BTW, for reviewers, during looking at this, I investigated the other sources. The other sources look okay because they don't have risky remote invocation like
|
|
LGTM |
|
Thank you for review, @Ngone51 . |
|
Test build #121123 has finished for PR 28187 at commit
|
|
Test build #121126 has finished for PR 28187 at commit
|
|
Thank you. Merged to master/3.0/2.4. |
…ckManagerMaster stops ### What changes were proposed in this pull request? This PR (SPARK-31422) aims to return empty result in order to avoid `NullPointerException` at `getStorageStatus` and `getMemoryStatus` which happens after `BlockManagerMaster` stops. The empty result is consistent with the current status of `SparkContext` because `BlockManager` and `BlockManagerMaster` is already stopped. ### Why are the changes needed? In `SparkEnv.stop`, the following stop sequence is used and `metricsSystem.stop` invokes `sink.stop`. ``` blockManager.master.stop() metricsSystem.stop() --> sinks.foreach(_.stop) ``` However, some sink can invoke `BlockManagerSource` and ends up with `NullPointerException` because `BlockManagerMaster` is already stopped and `driverEndpoint` became `null`. ``` java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:170) at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63) at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63) at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:31) at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:30) ``` Since `SparkContext` registers and forgets `BlockManagerSource` without deregistering, we had better avoid `NullPointerException` inside `BlockManagerMaster` preventively. ```scala _env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager)) ``` ### Does this PR introduce any user-facing change? Yes. This will remove NPE for the users who uses `BlockManagerSource`. ### How was this patch tested? Pass the Jenkins with the newly added test cases. Closes #28187 from dongjoon-hyun/SPARK-31422. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a6e6fbf) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ckManagerMaster stops ### What changes were proposed in this pull request? This PR (SPARK-31422) aims to return empty result in order to avoid `NullPointerException` at `getStorageStatus` and `getMemoryStatus` which happens after `BlockManagerMaster` stops. The empty result is consistent with the current status of `SparkContext` because `BlockManager` and `BlockManagerMaster` is already stopped. ### Why are the changes needed? In `SparkEnv.stop`, the following stop sequence is used and `metricsSystem.stop` invokes `sink.stop`. ``` blockManager.master.stop() metricsSystem.stop() --> sinks.foreach(_.stop) ``` However, some sink can invoke `BlockManagerSource` and ends up with `NullPointerException` because `BlockManagerMaster` is already stopped and `driverEndpoint` became `null`. ``` java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:170) at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63) at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63) at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:31) at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:30) ``` Since `SparkContext` registers and forgets `BlockManagerSource` without deregistering, we had better avoid `NullPointerException` inside `BlockManagerMaster` preventively. ```scala _env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager)) ``` ### Does this PR introduce any user-facing change? Yes. This will remove NPE for the users who uses `BlockManagerSource`. ### How was this patch tested? Pass the Jenkins with the newly added test cases. Closes #28187 from dongjoon-hyun/SPARK-31422. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a6e6fbf) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ckManagerMaster stops ### What changes were proposed in this pull request? This PR (SPARK-31422) aims to return empty result in order to avoid `NullPointerException` at `getStorageStatus` and `getMemoryStatus` which happens after `BlockManagerMaster` stops. The empty result is consistent with the current status of `SparkContext` because `BlockManager` and `BlockManagerMaster` is already stopped. ### Why are the changes needed? In `SparkEnv.stop`, the following stop sequence is used and `metricsSystem.stop` invokes `sink.stop`. ``` blockManager.master.stop() metricsSystem.stop() --> sinks.foreach(_.stop) ``` However, some sink can invoke `BlockManagerSource` and ends up with `NullPointerException` because `BlockManagerMaster` is already stopped and `driverEndpoint` became `null`. ``` java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:170) at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63) at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63) at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:31) at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:30) ``` Since `SparkContext` registers and forgets `BlockManagerSource` without deregistering, we had better avoid `NullPointerException` inside `BlockManagerMaster` preventively. ```scala _env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager)) ``` ### Does this PR introduce any user-facing change? Yes. This will remove NPE for the users who uses `BlockManagerSource`. ### How was this patch tested? Pass the Jenkins with the newly added test cases. Closes apache#28187 from dongjoon-hyun/SPARK-31422. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR (SPARK-31422) aims to return empty result in order to avoid
NullPointerExceptionatgetStorageStatusandgetMemoryStatuswhich happens afterBlockManagerMasterstops. The empty result is consistent with the current status ofSparkContextbecauseBlockManagerandBlockManagerMasteris already stopped.Why are the changes needed?
In
SparkEnv.stop, the following stop sequence is used andmetricsSystem.stopinvokessink.stop.However, some sink can invoke
BlockManagerSourceand ends up withNullPointerExceptionbecauseBlockManagerMasteris already stopped anddriverEndpointbecamenull.Since
SparkContextregisters and forgetsBlockManagerSourcewithout deregistering, we had better avoidNullPointerExceptioninsideBlockManagerMasterpreventively.Does this PR introduce any user-facing change?
Yes. This will remove NPE for the users who uses
BlockManagerSource.How was this patch tested?
Pass the Jenkins with the newly added test cases.