[SPARK-24873][YARN] Turn off spark-shell noisy log output #21784

wangyum · 2018-07-16T12:44:19Z

What changes were proposed in this pull request?

SPARK-24182 changed the logApplicationReport from false to true. This pr revert it to false. otherwise spark-shell will show noisy log output:

...
18/07/16 04:46:25 INFO Client: Application report for application_1530676576026_54551 (state: RUNNING)
18/07/16 04:46:26 INFO Client: Application report for application_1530676576026_54551 (state: RUNNING)
...

Closes #21827

How was this patch tested?

manual tests

SparkQA · 2018-07-16T13:31:46Z

Test build #93111 has finished for PR 21784 at commit d6563ec.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2018-07-17T06:39:12Z

cc @vanzin, @jerryshao

jerryshao · 2018-07-17T08:11:10Z

...gers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala

      try {
        val YarnAppReport(_, state, diags) =
-          client.monitorApplication(appId.get, logApplicationReport = true)
+          client.monitorApplication(appId.get, logApplicationReport = false)


Yes, it's too verbose currently in the client mode. I remembered we only have such output in cluster mode YARN client. My only concern is that turning to false will also lose the detailed reports. I think it would be better if we still have the detailed report when state is changed.

HyukjinKwon · 2018-07-18T02:07:09Z

Hm, yea. I don't think find this super noisy though to be honest.

wangyum · 2018-07-18T03:08:45Z

It's noisy when type something:

HyukjinKwon · 2018-07-18T08:07:28Z

OK, but you can set sc.setLogLevel in the shell. For instance, if I run spark.range(10).show() with INFO, I got something like this:

scala> 18/07/18 07:58:47 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
spark.r18/07/18 07:58:48 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
ange(10)18/07/18 07:58:49 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
.show()18/07/18 07:58:50 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)

18/07/18 07:58:51 INFO internal.SharedState: loading hive config file: file:/home/spark/spark/conf/hive-site.xml
18/07/18 07:58:51 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/.../spark/spark-warehouse').
18/07/18 07:58:51 INFO internal.SharedState: Warehouse path is 'file:/home/spark/spark/spark-warehouse'.
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@7a68818c{/SQL,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/json.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@5f745970{/SQL/json,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@2afd8972{/SQL/execution,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@5784f6b9{/SQL/execution/json,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@6ccf06f1{/static/sql,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:52 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/07/18 07:58:52 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:53 INFO codegen.CodeGenerator: Code generated in 254.142542 ms
18/07/18 07:58:53 INFO codegen.CodeGenerator: Code generated in 65.397101 ms
18/07/18 07:58:53 INFO spark.SparkContext: Starting job: show at <console>:24
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Got job 0 (show at <console>:24) with 1 output partitions
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (show at <console>:24)
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Missing parents: List()
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at show at <console>:24), which has no missing parents
18/07/18 07:58:53 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:53 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 8.1 KB, free 408.9 MB)
18/07/18 07:58:53 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.0 KB, free 408.9 MB)
18/07/18 07:58:53 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ctr-e138-1518143905142-411342-01-000002.hwx.site:41073 (size: 4.0 KB, free: 408.9 MB)
18/07/18 07:58:53 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
18/07/18 07:58:54 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at <console>:24) (first 15 tasks are for partitions Vector(0))
18/07/18 07:58:54 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
18/07/18 07:58:54 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ctr-e138-1518143905142-411342-01-000008.hwx.site, executor 2, partition 0, PROCESS_LOCAL, 7864 bytes)
[Stage 0:>                                                          (0 + 1) / 1]18/07/18 07:58:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ctr-e138-1518143905142-411342-01-000008.hwx.site:37671 (size: 4.0 KB, free: 366.3 MB)
18/07/18 07:58:54 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1115 ms on ctr-e138-1518143905142-411342-01-000008.hwx.site (executor 2) (1/1)
18/07/18 07:58:55 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/07/18 07:58:55 INFO scheduler.DAGScheduler: ResultStage 0 (show at <console>:24) finished in 1.609 s
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Job 0 finished: show at <console>:24, took 1.673348 s
18/07/18 07:58:55 INFO spark.SparkContext: Starting job: show at <console>:24
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Got job 1 (show at <console>:24) with 1 output partitions
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (show at <console>:24)
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Missing parents: List()
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at show at <console>:24), which has no missing parents
18/07/18 07:58:55 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.1 KB, free 408.9 MB)
18/07/18 07:58:55 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.0 KB, free 408.9 MB)
18/07/18 07:58:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ctr-e138-1518143905142-411342-01-000002.hwx.site:41073 (size: 4.0 KB, free: 408.9 MB)
18/07/18 07:58:55 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1039
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at show at <console>:24) (first 15 tasks are for partitions Vector(1))
18/07/18 07:58:55 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks
18/07/18 07:58:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, ctr-e138-1518143905142-411342-01-000007.hwx.site, executor 1, partition 1, PROCESS_LOCAL, 7864 bytes)
18/07/18 07:58:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ctr-e138-1518143905142-411342-01-000007.hwx.site:33073 (size: 4.0 KB, free: 366.3 MB)
18/07/18 07:58:55 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
[Stage 1:>                                                          (0 + 1) / 1]18/07/18 07:58:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 948 ms on ctr-e138-1518143905142-411342-01-000007.hwx.site (executor 1) (1/1)
18/07/18 07:58:56 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/07/18 07:58:56 INFO scheduler.DAGScheduler: ResultStage 1 (show at <console>:24) finished in 0.956 s
18/07/18 07:58:56 INFO scheduler.DAGScheduler: Job 1 finished: show at <console>:24, took 0.959303 s
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+

Considering it's info level, I was thinking it's not super noisy. I am okay with disabling though. I got your point. Let me defer this to @vanzin and @jerryshao.

HyukjinKwon · 2018-07-21T04:07:20Z

@vanzin WDYT about this?

HyukjinKwon · 2018-07-21T04:08:41Z

@wangyum, mind adding Closes #21784 here?

HyukjinKwon

LGTM

HyukjinKwon · 2018-07-21T04:09:23Z

Also, mind adding [SPARK-24873] in the PR title since the JIRA happened to be open anyway.

HyukjinKwon · 2018-07-21T08:42:55Z

Merged to master.

guoxiaolongzte · 2018-07-23T02:40:16Z

what? I think we need to add a switch.

HyukjinKwon · 2018-07-23T02:45:20Z

? I think we don't need a switch.

HyukjinKwon · 2018-07-23T02:47:04Z

Maybe we could consider avoiding this logs in shell specifically. Adding a switch for disable/enable logs sounds an overkill.

guoxiaolongzte · 2018-07-23T03:14:06Z

But for some spark-submit applications, I want these Application report for information.
What should I do?

HyukjinKwon · 2018-07-23T03:17:09Z

It has never been printed before, right? I think we can consider to turn it on specifically for spark-submit applications although I am not fully sure if it's something worth doing so yet.

guoxiaolongzte · 2018-07-23T03:25:58Z

We need to listen to @vanzin opinion.
Because the relevant code is what he wrote.

HyukjinKwon · 2018-07-23T03:28:18Z

Adding a configuration to control some logs sounds an overkill. I wouldn't go for this way.

srowen · 2018-07-23T03:34:22Z

Logging is already configurable; a switch is overkill. This seems fine.

vanzin · 2018-08-13T22:50:36Z

Belated +1; I didn't mean to make the output noisier, probably just flipped the value for debugging and forgot about it.

Turn off noisy log output

d6563ec

jerryshao reviewed Jul 17, 2018

View reviewed changes

srowen approved these changes Jul 18, 2018

View reviewed changes

wangyum mentioned this pull request Jul 20, 2018

[SPARK-24873]Increase switch to shielding frequent interaction report… #21827

Closed

HyukjinKwon approved these changes Jul 21, 2018

View reviewed changes

wangyum changed the title ~~[SPARK-24182][YARN][FOLLOW-UP] Turn off noisy log output~~ [SPARK-24873][YARN] Turn off spark-shell noisy log output Jul 21, 2018

asfgit closed this in d7ae424 Jul 21, 2018

HyukjinKwon mentioned this pull request Jul 23, 2018

Spark 24873 #21844

Closed

[SPARK-24873][YARN] Turn off spark-shell noisy log output #21784

[SPARK-24873][YARN] Turn off spark-shell noisy log output #21784

Uh oh!

Conversation

wangyum commented Jul 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 16, 2018

Uh oh!

wangyum commented Jul 17, 2018

Uh oh!

jerryshao Jul 17, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jul 18, 2018

Uh oh!

wangyum commented Jul 18, 2018

Uh oh!

HyukjinKwon commented Jul 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jul 21, 2018

Uh oh!

HyukjinKwon commented Jul 21, 2018

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jul 21, 2018

Uh oh!

HyukjinKwon commented Jul 21, 2018

Uh oh!

guoxiaolongzte commented Jul 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Jul 23, 2018

Uh oh!

HyukjinKwon commented Jul 23, 2018

Uh oh!

guoxiaolongzte commented Jul 23, 2018

Uh oh!

HyukjinKwon commented Jul 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guoxiaolongzte commented Jul 23, 2018

Uh oh!

HyukjinKwon commented Jul 23, 2018

Uh oh!

srowen commented Jul 23, 2018

Uh oh!

vanzin commented Aug 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wangyum commented Jul 16, 2018 •

edited

Loading

HyukjinKwon commented Jul 18, 2018 •

edited

Loading

guoxiaolongzte commented Jul 23, 2018 •

edited

Loading

HyukjinKwon commented Jul 23, 2018 •

edited

Loading