-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
permissions to write data to gcs from cluster workers #209
Comments
In the latter case, you should first authenticate with the browser (without distributed) and send the file |
So is the best way to pass these tokens to the workers through the worker-template.yaml file? Or is there a better way? If this is the best way, would it go in an extraConfig section? And what would that look like? |
Or maybe there is a way to set it up so that workers share the notebook home directory? Maybe that would cause all kinds of problems, but maybe not. |
At this point, I feel like giving workers access to the same home directory as the notebook would solve multiple problems, including this one. What are the technical and security barriers to doing this? What are the downsides? |
Has anyone tried the fix in fsspec/gcsfs#91 (comment) ? Given the secure internal network, this might be the easy fix to gcsfs auth problems. |
I'm running into this issue as well. Have we come up with a good method of passing authentication tokens to workers? |
@martindurant I saw fsspec/gcsfs#91 Do you think you could clarify how this would work? I'm not sure where |
Echoing @rabernat, my view is that getting read, and maybe even write access to the notebook /home/jovyan directory for the worker would be worth a try. I believe, but I'm not certain, that workers launch as root, so that is something that would likely need to be addressed. I have not had a chance to try this yet, but there is presumably a way to have the worker attach a pvc to the notebook volume. |
What are the technical and security barriers to doing this? What are the
downsides?
cc @jacobtomlinson
…On Thu, Apr 12, 2018 at 11:49 AM, Tim Crone ***@***.***> wrote:
Echoing @rabernat <https://github.com/rabernat>, my view is that getting
read, and maybe even write access to the notebook /home/jovyan directory
would be worth a try. I believe, but I'm not certain, that workers launch
as root, so that is something that would likely need to be addressed. I
have not had a chance to try this yet, but there is presumably a way to
have the worker attach a pvc to the notebook volume.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#209 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszGOXyXYdXmQhhW7Z4qEVR9U206u7ks5tn4WWgaJpZM4TQvdw>
.
|
I think @martindurant is referring to my example, in which gcs = gcsfs.GCSFileSystem(token=token)
More explicitly, I think what he means is something like this # create a GCSFileSystem just for the purpose of authentication
gcs_orig = gcsfs.GCSFileSystem(project='pangeo-181919', token='browser)
# create another one with those credentials
gcs = gcsfs.GCSFileSystem(project='pangeo-181919', token=gcs_orig.session.credentials)
# now use this to open the mapping to the zarr store
gcsmap = gcsfs.mapping.GCSMap('bucket/path/to/store', gcs=gcs) |
Uptdate: I just tried @martindurant's suggestion to create a GCSFileSystem object using token='cache' or token='browser', and then rewrote the cell to instead pass the .session.credentials of this object back as the token to create another version of this object, and it worked when passed around to workers. |
Right, what @rabernat said. And I can confirm that it works! |
Hurray! I could add a |
Maybe it would be less hacky to have a way to create a .session.credentials object separately from creating a GCSFileSystem object, to have and to hold and to pass around as needed? |
Question about how the reauthentication currently works: Does it happen once per worker or once per task? |
The intent was to provide |
This has progressed a bit since @mrocklin name checked me but here are a couple of comments. Access to home directories from the workers would be ideal. The current volume type that is used for home directories only allows mounting in a single location, but changing to a different type could fix this. I am currently perusing this on our AWS cluster. I was also under the impression that the workers run as All that being said it looks like you've got a valid workaround! |
Yep, this worked for me as well. It's a little convoluted, so I hope we can find a more long-term solution. |
@jacobtomlinson, I believe the docker image fires up as the last user called in the Dockerfile. At least that's how it works when a docker image is "run" from the command line. So because the Pangeo worker Dockerfile finishes up as root, that's how the image starts. Is that not how things work when the image is launched by KubeCluster? |
Ah interesting! I just had a look and I see the worker is using a different image to the notebook. On our cluster we are using the notebook image for the workers too, so the user is |
We ended installing google fuse in our docker images, creating service account creds for bucket access and then passing those creds to the worker-template.yml and then have the workers write to gcs buckets via google fuse. |
if you have time to characterize the read performance for HDF/NetCDF files
with google fuse that would be most welcome.
…On Fri, Apr 13, 2018 at 7:01 PM, J Gerard ***@***.***> wrote:
We ended installing google fuse in our docker images, creating service
account creds for bucket access and then passing those creds to the
worker-template.yml and then have the workers write to gcs buckets via
google fuse.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#209 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszCe6zHXaKx308tYneXqQTSvWD3W0ks5toTxYgaJpZM4TQvdw>
.
|
OK, so around 4-5MB/s
…On Fri, Apr 13, 2018 at 8:10 PM, J Gerard ***@***.***> wrote:
[image: screen shot 2018-04-13 at 3 10 26 pm]
<https://user-images.githubusercontent.com/6101444/38762873-d6e338d0-3f2c-11e8-8158-69159ce381a7.png>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#209 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszMWB1cvUck7fFbOBPP5ouChXbhI3ks5toUyZgaJpZM4TQvdw>
.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date. |
I would like to create an xarray dataset from pangeo.pydata.org and write to to gcs from the distributed cluster.
For example:
If I don't specify a token at all, or use
token='cloud'
, I get this error immediately.If I use
token='browser'
, I authenticate and things work from the notebook (the objects are created). But the writes from the workers don't work. These are the errors that come up.If I use
token='cache'
, (after having authenticated) again it works from from notebook but fails from the cluster.Very similar to fsspec/gcsfs#90.
cc @martindurant and @jgerardsimcock
The text was updated successfully, but these errors were encountered: