-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requeste... #1984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
QA tests have started for PR 1984 at commit
|
|
QA tests have finished for PR 1984 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the culprit is that args.amQueue is null. Should we do a null check on that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is it safe to call appContext.setQueue(null) in L81? (and setApplicationName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes you think that? amQueue gets set with:
var amQueue = sparkConf.get("QUEUE", "default")
So I don't think it can end up null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Many of the other arguments in ClientArguments were initialized to null so I thought this was the same. Then it's fine.
(On an unrelated note initializing the queue to this value makes little sense to me. We don't set the QUEUE attribute anywhere in the code, and creating a fresh SparkConf won't actually pick this config up because it is not prefixed with "spark". This seems to be always initialized to "default")
|
Hi @sryza, thanks for fixing this. I am actually somewhat confused by this because this code has apparently been around for a while, and we have not seen this issue until recently. Do you know what has been added recently that might have caused this to surface? Also it would be good to fix the equivalent in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this question is more "what do other apps do?" than anything else. But it sounds like the requested queue not existing would be reason for a more explicit failure?
In the very least I'd promote this to logWarning(), but I'm actually thinking just failing the app, since the requested queue not existing might lead to unwanted behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also run into this before and the message isn't very user friendly as to why the job failed. The job isn't going to run so I think logError and throwing an exception and exiting would be better. It would actually be nice to move this up to validateArgs but I don't want to have to look it up twice either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not uncommon for YARN to place an app into a queue different than the one requested. The Fair Scheduler has policies for this and this Capacity Scheduler is about to add them. So I don't think failing the job is the right move. On further thought, it actually might make sense to avoid reporting this queue info at all until we can get it from the queue that the app actually ends up in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the fair scheduler placement policies. Placement policies can be configured to ignore the queue passed in (or to only ignore it if "default" is specified).
…sted queue doesn't exist
|
Posted a patch that removes the queue resources log message entirely |
|
QA tests have started for PR 1984 at commit
|
|
QA tests have finished for PR 1984 at commit
|
|
The changes lgtm. |
…ste... ...d queue doesn't exist Author: Sandy Ryza <sandy@cloudera.com> Closes #1984 from sryza/sandy-spark-3082 and squashes the following commits: fe08c37 [Sandy Ryza] Remove log message entirely 85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist (cherry picked from commit 92af231) Signed-off-by: Andrew Or <andrewor14@gmail.com>
|
Thanks, I've merged this into master and 1.1. |
…ste... ...d queue doesn't exist Author: Sandy Ryza <sandy@cloudera.com> Closes apache#1984 from sryza/sandy-spark-3082 and squashes the following commits: fe08c37 [Sandy Ryza] Remove log message entirely 85253ad [Sandy Ryza] SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requested queue doesn't exist
...d queue doesn't exist