Skip to content

Conversation

@varunsaxena
Copy link
Contributor

[SPARK-4688] Have a single shared network timeout in Spark

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

int t = conf.getInt("spark.shuffle.io.connectionTimeout", conf.getInt("spark.network.timeout", 100);
return t * 1000;

@varunsaxena
Copy link
Contributor Author

Wanted to know what should be the default spark.network.timeout value ? I have kept it at 100 sec. Should it be different ?

@varunsaxena
Copy link
Contributor Author

Made the changes as per review. Also updated configuration.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was default value changed from 60 to 100?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lewuathe , that is what I wanted to know. That what should I keep as the default value. Keep a single fixed timeout value for the config spark.network.timeout or change the default based on defaults for earlier configurations which this config intends to replace. As I am pretty new to Spark, I am not aware of what default value will be suitable. Maybe @rxin can confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 100 is ok - given the akka timeout was 100.

@varunsaxena
Copy link
Contributor Author

@rxin , any conclusion on what should be the default timeout values used in these different cases ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add a space before 100 here

@varunsaxena
Copy link
Contributor Author

@rxin , I will just summarize what are the configuration defaults I have used. I put a value of 100 in initial pull request with the intention of having a futher discussion on appropriate defaults.

There are 2 approaches possible. We can continue using the same defaults as earlier i.e. spark.network.timeout will have different default values at different places. Or decide a fixed default value. I think latter should be done but an appropriate value has to be decided.

  1. spark.core.connection.ack.wait.timeout - Default value of 60s was used earlier.
  2. spark.shuffle.io.connectionTimeout - Default value of 120s was used earlier.
  3. spark.akka.timeout - Default value of 100s. was used earlier
  4. spark.storage.blockManagerSlaveTimeoutMs - Here default was 3 times value of spark.executor.heartbeatInterval or 45 sec., whichever is higher.

I think based on these cases we can fix a default timeout value of 120 sec. for spark.network.timeout
The only issue i can see is in case 4. But 120 sec. should be a good enough upper cap I think.

[SPARK-4688] Have a single shared network timeout in Spark
@varunsaxena
Copy link
Contributor Author

@rxin , any conclusion on this ?

@pwendell
Copy link
Contributor

ping @rxin

@rxin
Copy link
Contributor

rxin commented Dec 31, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24963 has started for PR 3562 at commit 6e97f72.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 31, 2014

Test build #24963 has finished for PR 3562 at commit 6e97f72.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24963/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Jan 5, 2015

Thanks. I'm merging this in master.

@asfgit asfgit closed this in d3f07fd Jan 5, 2015
@varunsaxena
Copy link
Contributor Author

Thanks for the review and commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants