-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7771] [SPARK-7779] Dynamic allocation: lower default timeouts further #6301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This seems like a good idea to me. I could even support going lower (e.g. hundreds of ms) on the add time, given that we now cap at the number of executors needed to satisfy tasks and cancel pending requests. |
|
Test build #33186 has finished for PR 6301 at commit
|
|
I just tried this out on a real cluster and it works well. The output from Before this patch, we were getting a bunch of |
|
Test build #33204 has finished for PR 6301 at commit
|
|
Test build #33205 has finished for PR 6301 at commit
|
|
@sryza It's good that we support it being that low but I'd rather be a little conservative for default values. EDIT: actually we don't support it being in milliseconds currently because we cast the time to seconds, so this is the lowest it can ever get without some refactoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale behind bumping this up to info? It seems like there will be common situations (any time that there are sustained pending tasks but we've hit our max) where this will end up printing a log every second. Which is a lot of noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops I'm not sure that was an intentional change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though at some point it might be good to log it in a limited way. It will introduce another variable but at least the user will know when s/he hits the cap.
|
Test build #33280 has finished for PR 6301 at commit
|
|
LGTM - makes sense to lower these thresholds. |
…further The default add time of 5s is still too slow for small jobs. Also, the current default remove time of 10 minutes seem rather high. This patch lowers both and rephrases a few log messages. Author: Andrew Or <andrew@databricks.com> Closes #6301 from andrewor14/da-minor and squashes the following commits: 6d614a6 [Andrew Or] Lower log level 2811492 [Andrew Or] Log information when requests are canceled 5fcd3eb [Andrew Or] Fix tests 3320710 [Andrew Or] Lower timeouts + rephrase a few log messages (cherry picked from commit 3d8760d) Signed-off-by: Andrew Or <andrew@databricks.com>
…further The default add time of 5s is still too slow for small jobs. Also, the current default remove time of 10 minutes seem rather high. This patch lowers both and rephrases a few log messages. Author: Andrew Or <andrew@databricks.com> Closes apache#6301 from andrewor14/da-minor and squashes the following commits: 6d614a6 [Andrew Or] Lower log level 2811492 [Andrew Or] Log information when requests are canceled 5fcd3eb [Andrew Or] Fix tests 3320710 [Andrew Or] Lower timeouts + rephrase a few log messages
…further The default add time of 5s is still too slow for small jobs. Also, the current default remove time of 10 minutes seem rather high. This patch lowers both and rephrases a few log messages. Author: Andrew Or <andrew@databricks.com> Closes apache#6301 from andrewor14/da-minor and squashes the following commits: 6d614a6 [Andrew Or] Lower log level 2811492 [Andrew Or] Log information when requests are canceled 5fcd3eb [Andrew Or] Fix tests 3320710 [Andrew Or] Lower timeouts + rephrase a few log messages
…further The default add time of 5s is still too slow for small jobs. Also, the current default remove time of 10 minutes seem rather high. This patch lowers both and rephrases a few log messages. Author: Andrew Or <andrew@databricks.com> Closes apache#6301 from andrewor14/da-minor and squashes the following commits: 6d614a6 [Andrew Or] Lower log level 2811492 [Andrew Or] Log information when requests are canceled 5fcd3eb [Andrew Or] Fix tests 3320710 [Andrew Or] Lower timeouts + rephrase a few log messages
The default add time of 5s is still too slow for small jobs. Also, the current default remove time of 10 minutes seem rather high. This patch lowers both and rephrases a few log messages.