Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 11, 2020

What changes were proposed in this pull request?

This PR (SPARK-31422) aims to return empty result in order to avoid NullPointerException at getStorageStatus and getMemoryStatus which happens after BlockManagerMaster stops. The empty result is consistent with the current status of SparkContext because BlockManager and BlockManagerMaster is already stopped.

Why are the changes needed?

In SparkEnv.stop, the following stop sequence is used and metricsSystem.stop invokes sink.stop.

blockManager.master.stop()
metricsSystem.stop() --> sinks.foreach(_.stop)

However, some sink can invoke BlockManagerSource and ends up with NullPointerException because BlockManagerMaster is already stopped and driverEndpoint became null.

java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:170)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:31)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:30)

Since SparkContext registers and forgets BlockManagerSource without deregistering, we had better avoid NullPointerException inside BlockManagerMaster preventively.

_env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager))

Does this PR introduce any user-facing change?

Yes. This will remove NPE for the users who uses BlockManagerSource.

How was this patch tested?

Pass the Jenkins with the newly added test cases.

* amount of remaining memory.
*/
def getMemoryStatus: Map[BlockManagerId, (Long, Long)] = {
if (driverEndpoint == null) return Map.empty
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although only getStorageStatus is used, I added the preventive code into getMemoryStatus together.

@dongjoon-hyun
Copy link
Member Author

Hi, @maropu .
Could you review this PR please?

@SparkQA
Copy link

SparkQA commented Apr 11, 2020

Test build #121122 has finished for PR 28187 at commit 438b78b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 11, 2020

retest this please

@maropu
Copy link
Member

maropu commented Apr 11, 2020

The fix itself is straight-forward and looks fine to me. Btw, (Just a question); couldn't we stop MetricsSystem before stopping BlockManagerMaster even if it depends on BlockManagerMaster?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 11, 2020

Thank you for reviewing and approval. Yes, it's possible, but the stop sequence has been fixed since Apache Spark 0.8.0-incubating. There might exist a risk because some user sink classes depend on that behavior. So, in this PR, I wanted to focus on this narrow one as a safer solution.


import org.apache.spark.{SparkConf, SparkFunSuite}

class BlockManagerMasterSuite extends SparkFunSuite {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests to BlockManagerSuite should be enough?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but BlockManagerSuite looks misleading to me because this is BlockManagerManger. If there is a more broader suite, it could be a candidate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it should be ok because we always test BlockManagerMaster together with BlockManager. And there're already some tests which test BlockManagerMaster only, e.g.

  • query locations of blockIds

  • SPARK-30594: Do not post SparkListenerBlockUpdated when updateBlockInfo returns false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't think so. In your example, query locations of blockIds is just do mocking instead of testing.

test("query locations of blockIds") {
    val mockBlockManagerMaster = mock(classOf[BlockManagerMaster])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, BlockManagerSuite is already large with 1805 lines. In general, we had better have a proper UT case.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 11, 2020

BTW, for reviewers, during looking at this, I investigated the other sources. The other sources look okay because they don't have risky remote invocation like BlockManagerSource.

  • DAGSchedulerSource
  • JVMCPUSource
  • ExecutorMetricsSource
  • ExecutorAllocationManagerSource
  • AppStatusSource
  • ExternalShuffleServiceSource
  • MasterSource
  • WorkerSource
  • ExecutorSource
  • PluginMetricsSource
  • CodegenMetrics
  • HiveCatalogMetrics
  • LongAccumulatorSource
  • DoubleAccumulatorSource
  • LiveListenerBusMetrics
  • ApplicationSource
  • MetricsReporter
  • StreamingSource

@Ngone51
Copy link
Member

Ngone51 commented Apr 11, 2020

LGTM

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @Ngone51 .

@SparkQA
Copy link

SparkQA commented Apr 11, 2020

Test build #121123 has finished for PR 28187 at commit 438b78b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2020

Test build #121126 has finished for PR 28187 at commit 82e46a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you. Merged to master/3.0/2.4.

dongjoon-hyun added a commit that referenced this pull request Apr 11, 2020
…ckManagerMaster stops

### What changes were proposed in this pull request?

This PR (SPARK-31422) aims to return empty result in order to avoid `NullPointerException` at `getStorageStatus` and `getMemoryStatus` which happens after `BlockManagerMaster` stops. The empty result is consistent with the current status of `SparkContext` because `BlockManager` and `BlockManagerMaster` is already stopped.

### Why are the changes needed?

In `SparkEnv.stop`, the following stop sequence is used and `metricsSystem.stop` invokes `sink.stop`.
```
blockManager.master.stop()
metricsSystem.stop() --> sinks.foreach(_.stop)
```

However, some sink can invoke `BlockManagerSource` and ends up with `NullPointerException` because `BlockManagerMaster` is already stopped and `driverEndpoint` became `null`.
```
java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:170)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:31)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:30)
```

Since `SparkContext` registers and forgets `BlockManagerSource` without deregistering, we had better avoid `NullPointerException` inside `BlockManagerMaster` preventively.
```scala
_env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager))
```

### Does this PR introduce any user-facing change?

Yes. This will remove NPE for the users who uses `BlockManagerSource`.

### How was this patch tested?

Pass the Jenkins with the newly added test cases.

Closes #28187 from dongjoon-hyun/SPARK-31422.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a6e6fbf)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Apr 11, 2020
…ckManagerMaster stops

### What changes were proposed in this pull request?

This PR (SPARK-31422) aims to return empty result in order to avoid `NullPointerException` at `getStorageStatus` and `getMemoryStatus` which happens after `BlockManagerMaster` stops. The empty result is consistent with the current status of `SparkContext` because `BlockManager` and `BlockManagerMaster` is already stopped.

### Why are the changes needed?

In `SparkEnv.stop`, the following stop sequence is used and `metricsSystem.stop` invokes `sink.stop`.
```
blockManager.master.stop()
metricsSystem.stop() --> sinks.foreach(_.stop)
```

However, some sink can invoke `BlockManagerSource` and ends up with `NullPointerException` because `BlockManagerMaster` is already stopped and `driverEndpoint` became `null`.
```
java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:170)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:31)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:30)
```

Since `SparkContext` registers and forgets `BlockManagerSource` without deregistering, we had better avoid `NullPointerException` inside `BlockManagerMaster` preventively.
```scala
_env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager))
```

### Does this PR introduce any user-facing change?

Yes. This will remove NPE for the users who uses `BlockManagerSource`.

### How was this patch tested?

Pass the Jenkins with the newly added test cases.

Closes #28187 from dongjoon-hyun/SPARK-31422.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit a6e6fbf)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-31422 branch April 11, 2020 15:29
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ckManagerMaster stops

### What changes were proposed in this pull request?

This PR (SPARK-31422) aims to return empty result in order to avoid `NullPointerException` at `getStorageStatus` and `getMemoryStatus` which happens after `BlockManagerMaster` stops. The empty result is consistent with the current status of `SparkContext` because `BlockManager` and `BlockManagerMaster` is already stopped.

### Why are the changes needed?

In `SparkEnv.stop`, the following stop sequence is used and `metricsSystem.stop` invokes `sink.stop`.
```
blockManager.master.stop()
metricsSystem.stop() --> sinks.foreach(_.stop)
```

However, some sink can invoke `BlockManagerSource` and ends up with `NullPointerException` because `BlockManagerMaster` is already stopped and `driverEndpoint` became `null`.
```
java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMaster.getStorageStatus(BlockManagerMaster.scala:170)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anonfun$10.apply(BlockManagerSource.scala:63)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:31)
at org.apache.spark.storage.BlockManagerSource$$anon$1.getValue(BlockManagerSource.scala:30)
```

Since `SparkContext` registers and forgets `BlockManagerSource` without deregistering, we had better avoid `NullPointerException` inside `BlockManagerMaster` preventively.
```scala
_env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager))
```

### Does this PR introduce any user-facing change?

Yes. This will remove NPE for the users who uses `BlockManagerSource`.

### How was this patch tested?

Pass the Jenkins with the newly added test cases.

Closes apache#28187 from dongjoon-hyun/SPARK-31422.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants