Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change move bucket #87

Merged
merged 8 commits into from
Jan 23, 2024
Merged

Change move bucket #87

merged 8 commits into from
Jan 23, 2024

Conversation

jbusecke
Copy link
Collaborator

No description provided.

@jbusecke
Copy link
Collaborator Author

jbusecke commented Jan 19, 2024

Ok this is unfortunately not working right now, and I am a bit confused as to why. To narrow down the issue I tried 3 different ways to 'copy' a test store from leap-scratch to the public GCS bucket on the LEAP-Pangeo hub

1 - Try beam.io copy with leap default auth

gcs = gcsio.GcsIO()
gcs.copytree(source, target)

This fails as expected (the default auth should not have access to the pub bucket)

Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/cmip6?projection=noAcl&prettyPrint=false: leap-prod@leap-pangeo.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist).

2 Try beam.io copy with new service account auth

client = storage.Client.from_service_account_json("/home/jovyan/KEYS/pangeo-cmip6-public-service-account.json")
gcs = gcsio.GcsIO(storage_client=client)
gcs.copytree(source, target)

this ALSO fails (and I guess is a reproducer for what happens in the beam pipeline)

Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/cmip6?projection=noAcl&prettyPrint=false: public-cmip-google-cloud@leap-pangeo.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket. Permission 'storage.buckets.get' denied on resource (or it may not exist).

Note the different service account email, indicating that the auth is picking up something from the key I provided.

But I think this is actually an exact reproducer of the failure I am seeing in the dataflow jobs example

but here comes the weird part:

3 - Try with fsspec and new service account auth

import json
with open("/home/jovyan/KEYS/pangeo-cmip6-public-service-account.json") as f:
    token = json.load(f)
    
fs_auth = gcsfs.GCSFileSystem(token=token)
fs_auth.cp(source, target, recursive=True)

THIS WORKS!


So from this I conclude:

  • This is not actually a problem with the permissions on the public bucket
  • There is something weird about the google storage python api?

@jbusecke
Copy link
Collaborator Author

For now I am trying to use gcsfs instead of beam.io

@jbusecke
Copy link
Collaborator Author

This worked. I will modify this a bit further to limit the number of iids and test out the 'full move' to the public bucket.

@jbusecke jbusecke merged commit 6148a85 into main Jan 23, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant