Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Jul 16, 2018

What changes were proposed in this pull request?

SPARK-24182 changed the logApplicationReport from false to true. This pr revert it to false. otherwise spark-shell will show noisy log output:

...
18/07/16 04:46:25 INFO Client: Application report for application_1530676576026_54551 (state: RUNNING)
18/07/16 04:46:26 INFO Client: Application report for application_1530676576026_54551 (state: RUNNING)
...

Closes #21827

How was this patch tested?

manual tests

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93111 has finished for PR 21784 at commit d6563ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Jul 17, 2018

cc @vanzin, @jerryshao

try {
val YarnAppReport(_, state, diags) =
client.monitorApplication(appId.get, logApplicationReport = true)
client.monitorApplication(appId.get, logApplicationReport = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's too verbose currently in the client mode. I remembered we only have such output in cluster mode YARN client. My only concern is that turning to false will also lose the detailed reports. I think it would be better if we still have the detailed report when state is changed.

@HyukjinKwon
Copy link
Member

Hm, yea. I don't think find this super noisy though to be honest.

@wangyum
Copy link
Member Author

wangyum commented Jul 18, 2018

It's noisy when type something:
spark-24128

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 18, 2018

OK, but you can set sc.setLogLevel in the shell. For instance, if I run spark.range(10).show() with INFO, I got something like this:

scala> 18/07/18 07:58:47 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
spark.r18/07/18 07:58:48 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
ange(10)18/07/18 07:58:49 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
.show()18/07/18 07:58:50 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)

18/07/18 07:58:51 INFO internal.SharedState: loading hive config file: file:/home/spark/spark/conf/hive-site.xml
18/07/18 07:58:51 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/.../spark/spark-warehouse').
18/07/18 07:58:51 INFO internal.SharedState: Warehouse path is 'file:/home/spark/spark/spark-warehouse'.
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@7a68818c{/SQL,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/json.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@5f745970{/SQL/json,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@2afd8972{/SQL/execution,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /SQL/execution/json.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@5784f6b9{/SQL/execution/json,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /static/sql.
18/07/18 07:58:51 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@6ccf06f1{/static/sql,null,AVAILABLE,@Spark}
18/07/18 07:58:51 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:52 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/07/18 07:58:52 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:53 INFO codegen.CodeGenerator: Code generated in 254.142542 ms
18/07/18 07:58:53 INFO codegen.CodeGenerator: Code generated in 65.397101 ms
18/07/18 07:58:53 INFO spark.SparkContext: Starting job: show at <console>:24
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Got job 0 (show at <console>:24) with 1 output partitions
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (show at <console>:24)
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Missing parents: List()
18/07/18 07:58:53 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[4] at show at <console>:24), which has no missing parents
18/07/18 07:58:53 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:53 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 8.1 KB, free 408.9 MB)
18/07/18 07:58:53 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 4.0 KB, free 408.9 MB)
18/07/18 07:58:53 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ctr-e138-1518143905142-411342-01-000002.hwx.site:41073 (size: 4.0 KB, free: 408.9 MB)
18/07/18 07:58:53 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
18/07/18 07:58:54 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at show at <console>:24) (first 15 tasks are for partitions Vector(0))
18/07/18 07:58:54 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
18/07/18 07:58:54 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ctr-e138-1518143905142-411342-01-000008.hwx.site, executor 2, partition 0, PROCESS_LOCAL, 7864 bytes)
[Stage 0:>                                                          (0 + 1) / 1]18/07/18 07:58:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ctr-e138-1518143905142-411342-01-000008.hwx.site:37671 (size: 4.0 KB, free: 366.3 MB)
18/07/18 07:58:54 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
18/07/18 07:58:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1115 ms on ctr-e138-1518143905142-411342-01-000008.hwx.site (executor 2) (1/1)
18/07/18 07:58:55 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/07/18 07:58:55 INFO scheduler.DAGScheduler: ResultStage 0 (show at <console>:24) finished in 1.609 s
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Job 0 finished: show at <console>:24, took 1.673348 s
18/07/18 07:58:55 INFO spark.SparkContext: Starting job: show at <console>:24
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Got job 1 (show at <console>:24) with 1 output partitions
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (show at <console>:24)
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Missing parents: List()
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at show at <console>:24), which has no missing parents
18/07/18 07:58:55 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.1 KB, free 408.9 MB)
18/07/18 07:58:55 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.0 KB, free 408.9 MB)
18/07/18 07:58:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ctr-e138-1518143905142-411342-01-000002.hwx.site:41073 (size: 4.0 KB, free: 408.9 MB)
18/07/18 07:58:55 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1039
18/07/18 07:58:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at show at <console>:24) (first 15 tasks are for partitions Vector(1))
18/07/18 07:58:55 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks
18/07/18 07:58:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, ctr-e138-1518143905142-411342-01-000007.hwx.site, executor 1, partition 1, PROCESS_LOCAL, 7864 bytes)
18/07/18 07:58:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on ctr-e138-1518143905142-411342-01-000007.hwx.site:33073 (size: 4.0 KB, free: 366.3 MB)
18/07/18 07:58:55 INFO yarn.Client: Application report for application_1531383843352_0013 (state: RUNNING)
[Stage 1:>                                                          (0 + 1) / 1]18/07/18 07:58:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 948 ms on ctr-e138-1518143905142-411342-01-000007.hwx.site (executor 1) (1/1)
18/07/18 07:58:56 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/07/18 07:58:56 INFO scheduler.DAGScheduler: ResultStage 1 (show at <console>:24) finished in 0.956 s
18/07/18 07:58:56 INFO scheduler.DAGScheduler: Job 1 finished: show at <console>:24, took 0.959303 s
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+

Considering it's info level, I was thinking it's not super noisy. I am okay with disabling though. I got your point. Let me defer this to @vanzin and @jerryshao.

@HyukjinKwon
Copy link
Member

@vanzin WDYT about this?

@HyukjinKwon
Copy link
Member

@wangyum, mind adding Closes #21784 here?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon
Copy link
Member

Also, mind adding [SPARK-24873] in the PR title since the JIRA happened to be open anyway.

@wangyum wangyum changed the title [SPARK-24182][YARN][FOLLOW-UP] Turn off noisy log output [SPARK-24873][YARN] Turn off spark-shell noisy log output Jul 21, 2018
@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in d7ae424 Jul 21, 2018
@guoxiaolongzte
Copy link

guoxiaolongzte commented Jul 23, 2018

what? I think we need to add a switch.

@HyukjinKwon
Copy link
Member

? I think we don't need a switch.

@HyukjinKwon
Copy link
Member

Maybe we could consider avoiding this logs in shell specifically. Adding a switch for disable/enable logs sounds an overkill.

@guoxiaolongzte
Copy link

But for some spark-submit applications, I want these Application report for information.
What should I do?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 23, 2018

It has never been printed before, right? I think we can consider to turn it on specifically for spark-submit applications although I am not fully sure if it's something worth doing so yet.

@HyukjinKwon HyukjinKwon mentioned this pull request Jul 23, 2018
@guoxiaolongzte
Copy link

We need to listen to @vanzin opinion.
Because the relevant code is what he wrote.

@HyukjinKwon
Copy link
Member

Adding a configuration to control some logs sounds an overkill. I wouldn't go for this way.

@srowen
Copy link
Member

srowen commented Jul 23, 2018

Logging is already configurable; a switch is overkill. This seems fine.

@vanzin
Copy link
Contributor

vanzin commented Aug 13, 2018

Belated +1; I didn't mean to make the output noisier, probably just flipped the value for debugging and forgot about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants