Skip to content

Conversation

@growse
Copy link
Contributor

@growse growse commented Feb 10, 2015

[Was previously PR4507]

As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Feb 10, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27213 has started for PR 4509 at commit f57ce6b.

  • This patch merges cleanly.

@srowen
Copy link
Member

srowen commented Feb 10, 2015

@vanzin what do you think? I think you made the original change.
It seems like this method simply shouldn't conflate making a directory and restricting its permissions, if it's not always desirable to restrict them. That said if almost every caller wants permission restricted, it's messy to make all of them make two calls. At least, if this proceeds, perhaps pass a flag rather than have to pass the whole SparkConf?

@vanzin
Copy link
Contributor

vanzin commented Feb 10, 2015

Let me take a look some more. I remember investigating this and that the Yarn-generated directory was already 700 (or something pretty restrictive if not that), so it really didn't matter which permissions you have under that.

@SparkQA
Copy link

SparkQA commented Feb 10, 2015

Test build #27213 has finished for PR 4509 at commit f57ce6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27213/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need space before and after =

@andrewor14
Copy link
Contributor

@growse thanks for the changes. Do we need to do something similar for createTempDir as well, or is this sufficient for shuffle files? I believe so because the shuffle files should only go through the DiskBlockManager. Have you verified that this patch fixes the issue on a real cluster?

@vanzin
Copy link
Contributor

vanzin commented Feb 10, 2015

Hey @growse,

I tested on a kerberos cluster and did see what you mention in the bug. Thanks for that explanation.

I think I was a little too paranoid with the original patch, mainly to cover the case of standalone. Since then, there has been a change in standalone mode that should make this fix a lot simpler.

Basically, standalone mode now creates a per-app directory similarly to Yarn. So the code can get away with just changing the permissions of that "root" directory instead of every temp directory it creates.

Talking in code, this means that Utils.createDirectory doesn't need to do chmod 700 anymore. Instead, you can do it in Utils.getOrCreateLocalRootDirs. That method already singles out Yarn and doesn't change the original permissions. We just need to add the chmod for the non-Yarn case.

That should cover both the driver in client mode (which calls that method to create a local temp directory, and thus is not under the control of the NM when running on Yarn) and also the containers in standalone and mesos.

(In fact, there's a little redundancy between the work done by Worker.scala and Utils.scala now for standalone mode, but that shouldn't cause any problems aside from looking a little weird when you look at the raw files. We can clean that up separately if desired.)

@growse
Copy link
Contributor Author

growse commented Feb 10, 2015

@andrewor14 - No problem. I don't think there's such a problem with createTempDir, as I don't believe there's any requirement for files created under this to be read by any other process other than the container / executor. I've tested this patch on 1.2 against a CDH5.1.3 cluster and have verified that it's fixed in that case.

@vanzin - Sounds good. The main point is that doing any form of permission setting in a YARN context breaks the functionality provided by the parent dir's setgid flag, as it sets the gid of the file to that of the container, rather than that of the nodemanager as it should. If there's a better solution to isolating the YARN case and not setting permissions, we should probably do that instead.

@growse
Copy link
Contributor Author

growse commented Feb 12, 2015

Moving the chmod700 functionality has simplified this somewhat, have tested as working on YARN.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27353 has started for PR 4509 at commit 7ca993c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27353 has finished for PR 4509 at commit 7ca993c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27353/
Test PASSed.

@srowen
Copy link
Member

srowen commented Feb 12, 2015

That feels right-er to me. Any thoughts @vanzin?

@vanzin
Copy link
Contributor

vanzin commented Feb 12, 2015

Yep, looks good. Thanks for fixing this!

@srowen
Copy link
Member

srowen commented Feb 12, 2015

2 people have reviewed it, tests pass, looks like a clean fix. Since it could affect many users running YARN, I think it should back-port into 1.3. (The logic this changes wasn't present earlier, it appears.)

asfgit pushed a commit that referenced this pull request Feb 12, 2015
[Was previously PR4507]

As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user.

Author: Andrew Rowson <github@growse.com>

Closes #4509 from growse/master and squashes the following commits:

7ca993c [Andrew Rowson] Moved chmod700 functionality into Utils.getOrCreateLocalRootDirs
f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if running in a YARN container

(cherry picked from commit 466b1f6)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in 466b1f6 Feb 12, 2015
@srowen
Copy link
Member

srowen commented Feb 12, 2015

I take it back. I think this may need to back port to 1.2 as well. Looking into that now.

@vanzin
Copy link
Contributor

vanzin commented Feb 12, 2015

Yes, this should go into 1.2. Should be a clean merge, though.

@growse
Copy link
Contributor Author

growse commented Feb 12, 2015

Merges cleanly into 1.2 here. Would personally be immensely useful to
back-port into there.
On 12 Feb 2015 18:53, "Marcelo Vanzin" notifications@github.com wrote:

Yes, this should go into 1.2. Should be a clean merge, though.


Reply to this email directly or view it on GitHub
#4509 (comment).

asfgit pushed a commit that referenced this pull request Feb 12, 2015
[Was previously PR4507]

As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user.

Author: Andrew Rowson <github@growse.com>

Closes #4509 from growse/master and squashes the following commits:

7ca993c [Andrew Rowson] Moved chmod700 functionality into Utils.getOrCreateLocalRootDirs
f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if running in a YARN container

(cherry picked from commit 466b1f6)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@andrewor14
Copy link
Contributor

@vanzin just a quick question as of this patch createTempDir no longer does a chmod700. Is that OK? createTempDir is used mostly by HTTP file server, so don't we need to restrict the permissions on the files served there too?

@vanzin
Copy link
Contributor

vanzin commented Feb 12, 2015

That's ok because the temp dir would be created underneath one of the "root local dirs" created by the getOrCreateLocalRootDirs method here: https://github.com/apache/spark/pull/4509/files#diff-d239aee594001f8391676e1047a0381eR698

So the top-most directory would have the right permissions (700 in the usual case, or whatever permissions Yarn sets when talking about Yarn containers). That's enough to prevent access to the underlying tree.

@andrewor14
Copy link
Contributor

I see, that's because all of those places call createTempDir(Utils.getLocalDirs(...)), where getLocalDirs calls the getOrCreateLocalDirs. What about this one though:

@vanzin
Copy link
Contributor

vanzin commented Feb 12, 2015

That one could probably use the same treatment. Seems less important since, from the comment, it just stores class files created by the shell session, not really user data, but in any case, it's better to be safe.

@growse
Copy link
Contributor Author

growse commented Feb 12, 2015

Is it a bad principle that code that needs to create local, secure storage should explicitly call chmod700? So in the case of SparkIMain.scala, if the directory does need to have its permissions changed, that should be done in SparkIMain itself rather than in createTempDir?

To me, it would seem surprising to call createTempDir and find that the method not only creates directories but also sets permissions. Maybe createPrivateTempDir would be a better method name in that case.

@vanzin
Copy link
Contributor

vanzin commented Feb 12, 2015

So in the case of SparkIMain.scala, if the directory does need to have its permissions changed, that should be done in SparkIMain itself rather than in createTempDir?

SparkIMain should create its directory under the directory returned by getLocalDir(). getLocalDir() has specific semantics: it's an app-specific directory, placed in a location defined by the cluster manager (or the user in case of the driver in cluster mode), and is properly protected.

(createTempDir has an argument to define the parent directory where to create the temp dir.)

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Feb 24, 2015
[Was previously PR4507]

As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user.

Author: Andrew Rowson <github@growse.com>

Closes apache#4509 from growse/master and squashes the following commits:

7ca993c [Andrew Rowson] Moved chmod700 functionality into Utils.getOrCreateLocalRootDirs
f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if running in a YARN container

(cherry picked from commit 466b1f6)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants