Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Jul 21, 2021

What changes were proposed in this pull request?

When using prometheus to fetch metrics with a defined interval, we always pull data through restful API.
If the pulling happens when a driver SparkUI port is bind to the driver and the application is not fully started, Spark driver will throw a lot of exceptions about NoSuchElementException as below.

21/07/19 04:53:37 INFO Client: Preparing resources for our AM container
21/07/19 04:53:37 INFO Client: Uploading resource hdfs://tl3/packages/jars/spark-2.4-archive.tar.gz -> hdfs://R2/user/xiaoke.zhou/.sparkStaging/application_1624456325569_7143920/spark-2.4-archive.tar.gz
21/07/19 04:53:37 WARN JettyUtils: GET /jobs/ failed: java.util.NoSuchElementException: Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.
java.util.NoSuchElementException: Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.
	at org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43)
	at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275)
	at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
	at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
	at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
	at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
	at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
	at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513)
	at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
	at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
	at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
	at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.spark_project.jetty.server.Server.handle(Server.java:539)
	at org.spark_project.jetty.server.HttpChannel.handle(Htt
[2021-07-19 04:54:55,111] INFO - pChannel.java:333)
	at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
	at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
	at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
	at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
	at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
	at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
	at java.lang.Thread.run(Thread.java:748)

Have check origin pr, we need to start server and bind port before taskScheduler started for client mode since we need to pass web url to register application master. But when we attach and start handler this time, we can provide restful API to user, but during this time, application is not started so we always return such error.

In this pr, to start SparUI, Spark starts Jetty Server first to bind address.
After the Spark application is fully started, call [attachAllHandlers] to start all existing handlers to Jetty seerver.

Why are the changes needed?

Improve the SparkUI start logical

Does this PR introduce any user-facing change?

Before spark application is fully started, all url request will return

Spark is starting up. Please wait a while until it's ready.

in the page

How was this patch tested?

Existed

During after bind address and finish start spark application, all request will show
image

@AngersZhuuuu
Copy link
Contributor Author

ping @srowen since I have found you handle similar prs before in 2014

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Test build #141391 has started for PR 33457 at commit 7832d40.

@AngersZhuuuu
Copy link
Contributor Author

Related change
#1966
#2858

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45909/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45909/

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a lot about this, but it seems reasonable

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45923/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45923/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Test build #141406 has finished for PR 33457 at commit 4fb422c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

ping @maropu @HyukjinKwon @cloud-fan

@HyukjinKwon HyukjinKwon changed the title [SPARK-36237][SQL] We should attach and start handler after application started [SPARK-36237][UI][SQL] Attach and start handler after application started in UI Jul 22, 2021
@HyukjinKwon
Copy link
Member

cc @gengliangwang and @sarutak FYI

@AngersZhuuuu
Copy link
Contributor Author

Any suggestion?

@sarutak
Copy link
Member

sarutak commented Jul 26, 2021

@AngersZhuuuu
Before this change, the status code 500 is returned and helpful error message is shown if we access to /jobs before the UI is prepared.
SPARK-36237-500

But after this change, the status code 404 is returned and no helpful error message is shown.
SPARK-36237-404

It might be confusable for users.

@AngersZhuuuu
Copy link
Contributor Author

It might be confusable for users.

With this 500 and error stack in the log makes user confused too.. they always ask me if there is something wong.
Always user won't open spark Web ui when not fully started .

Anyway expose a non-useable api is not reasonable? right?

@AngersZhuuuu
Copy link
Contributor Author

Also you can check this related issue #1966

@cloud-fan
Copy link
Contributor

Shall we make the RESTFUL request hang and the web page loading if the spark application is not fully started?

@gengliangwang
Copy link
Member

When we use prometheus to fetch metrics, always pull data before application started.

we need to start server and bind port before taskScheduler started for client mode since we need to pass web url to register application master

@AngersZhuuuu could you update the PR description to make it more clear?

@gengliangwang
Copy link
Member

With this 500 and error stack in the log makes user confused too.. they always ask me if there is something wong.

At least before the changes it shows hint "if you are starting spark..."

@gengliangwang
Copy link
Member

Shall we make the RESTFUL request hang and the web page loading if the spark application is not fully started?

Or we can just redirect to a page saying Spark is starting

@AngersZhuuuu
Copy link
Contributor Author

Shall we make the RESTFUL request hang and the web page loading if the spark application is not fully started?

Show as below is ok?
image

@AngersZhuuuu
Copy link
Contributor Author

With this 500 and error stack in the log makes user confused too.. they always ask me if there is something wong.

At least before the changes it shows hint "if you are starting spark..."

How about current?

@SparkQA
Copy link

SparkQA commented Jul 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46208/

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@cloud-fan
Copy link
Contributor

#33457 (comment)

The web UI looks good. How about REST request?

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141754 has finished for PR 33457 at commit b16531f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46267/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46268/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46268/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141755 has finished for PR 33457 at commit b16531f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

@AngersZhuuuu Please update the PR description. Especially for the first and third ones.

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu Please update the PR description. Especially for the first and third ones.

How about current?

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46290/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46291/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46290/

@SparkQA
Copy link

SparkQA commented Jul 28, 2021

Test build #141778 has finished for PR 33457 at commit c514855.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.setMaster("local")
.setAppName("test")
.set(UI.UI_ENABLED, false)
val sc = new SparkContext(conf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test failure in GA related this the changes here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the test failure in GA related this the changes here?

Seems so..

@gengliangwang
Copy link
Member

@AngersZhuuuu seriously, the description is badly written.

When we use prometheus to fetch metrics, always pull data before application started.

Then throw a lot of exception not of NoSuchElementException

Also, please mention that there will be a hint saying that "Spark is starting up" in section "Does this PR introduce any user-facing change?"

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu seriously, the description is badly written.

When we use prometheus to fetch metrics, always pull data before application started.

Then throw a lot of exception not of NoSuchElementException

Also, please mention that there will be a hint saying that "Spark is starting up" in section "Does this PR introduce any user-facing change?"

How about current?

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46363/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Test build #141852 has finished for PR 33457 at commit 0c9c3ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

@gengliangwang GA passed now

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Aug 2, 2021

ping @gengliangwang @sarutak

@gengliangwang
Copy link
Member

Thanks, merging to master

@gongzh021
Copy link

@AngersZhuuuu Before this change, the status code 500 is returned and helpful error message is shown if we access to /jobs before the UI is prepared. SPARK-36237-500

But after this change, the status code 404 is returned and no helpful error message is shown. SPARK-36237-404

It might be confusable for users.

I have the same bug.
Excuse me, how did you solve it?

@AngersZhuuuu
Copy link
Contributor Author

@gongzh021 Maybe you can check this commit dba26cd

1 similar comment
@AngersZhuuuu
Copy link
Contributor Author

@gongzh021 Maybe you can check this commit dba26cd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants