Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google security error for token='browser' #261

Closed
rabernat opened this issue May 26, 2020 · 32 comments
Closed

Google security error for token='browser' #261

rabernat opened this issue May 26, 2020 · 32 comments

Comments

@rabernat
Copy link
Contributor

gcs = gcsfs.GCSFileSystem(token='browser')

When I follow the link, I immediately see

image

I'm on gcsfs 0.6.1.

@martindurant
Copy link
Member

My efforts to persuade google on the usefulness of the "device code" workflow has failed so far. I/someone should try to find time to see how other libraries (including google) are doing this kind of thing now. If we can figure it out, that should work for google-drive too.

@jkingslake
Copy link

I am having the same issue.

@martindurant
Copy link
Member

-edit - I was about to write this, but then I tried a thing, skip below-

One possibility might be to point people to the console so they can make their own token, and then use that token as given here: https://googleapis.dev/python/google-auth/latest/user-guide.html#user-credentials

@martindurant
Copy link
Member

The following seems to work, oddly:

from google_auth_oauthlib.flow import InstalledAppFlow
not_secret = {
    "client_id": "586241054156-9kst7ltfj66svc342pcn43vp6ta3idin"
    ".apps.googleusercontent.com",
    "client_secret": "xto0LIFYX35mmHF9T1R2QBqT",
}
client_config = {
    "installed": {
        "client_id": not_secret["client_id"],
        "client_secret": not_secret["client_secret"],
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://accounts.google.com/o/oauth2/token",
    }
}
flow = InstalledAppFlow.from_client_config(client_config, ["https://www.googleapis.com/auth/devstorage.full_control"])
cr = flow.run_console()
gcs = gcsfs.GCSFileSystem(token=cr, project='my-project')

Does this work for you??

@jkingslake
Copy link

I still get the same problem. Am I supposed to change anything other than 'my-project'?

@martindurant
Copy link
Member

Hm, just tried again from a machine that most certainly didn't have my google credentials anywhere on it, and it worked fine. Yes, the project should be the only thing to change - you should have seen a permissions grant and maybe login screen in your browser.

@martindurant
Copy link
Member

(the client config above is, by the way, the same as gcsfs.core.client_config)

@rabernat
Copy link
Contributor Author

This is causing problems for several users.

Is there anyone we can talk to at Google?

@martindurant
Copy link
Member

@rabernat , did you also try and fail with the code snippet I posted?

@jhamman
Copy link

jhamman commented Jun 5, 2020

I tried thework around above and got the same error posted by @rabernat.

@max-sixty
Copy link
Contributor

max-sixty commented Jul 23, 2020

I just hit this too.

No obligation ofc, but if @tswast sees this, could he point us in the right direction? I've spoken with him a couple of times a while back about the best way to do GC authentication.

@martindurant
Copy link
Member

Would love some help on this...

@max-sixty
Copy link
Contributor

@tswast
Copy link

tswast commented Jul 23, 2020

This appears to be a issue with unverified client ID & secrets associated with GCSFS. Google requires all apps (especially those that use Google Cloud scopes) go through a verification process.

I went through the process for pandas-gbq and pydata-google-auth. I started the process for Ibis, but got stuck at the domain verification step.

A potential workaround while you navigate the verification process: add configuration options to override the client ID & secrets. Users can then create their own OAuth Client ID from the Cloud Console: https://support.google.com/cloud/answer/6158849?hl=en

@martindurant
Copy link
Member

pydata-google-auth seems to produce a credentials object that we should be able to use - can we just borrow that?

@martindurant
Copy link
Member

(and thanks, @max-sixty @tswast , for pointing pydata-google-auth out, I was not aware of it)

@max-sixty
Copy link
Contributor

pydata-google-auth seems to produce a credentials object that we should be able to use - can we just borrow that?

I think so — I'm trying this now. Do you know off hand how to transform the credentials object that returns to the token that gcsfs expects?

@max-sixty
Copy link
Contributor

I think the "right" way to do it may be for gcsfs to use its own appplication id, but that should be an easy fix if we can get this working

@martindurant
Copy link
Member

Sorry, I don't remember - it's been a while. The instance does expose the session token and refresh token as attributes (see also .to_json()).

@max-sixty
Copy link
Contributor

Great, thanks @tswast! I just saw your comment above. That seems like a good workaround for the moment, I can try and work that in.

@max-sixty
Copy link
Contributor

This works, though it involves using the pydata-auth key, and using a scope that hasn't been verified, so a scary box in the browser:

BUCKET = {...}
DIR = {...}

!pip install gcsfs ujson pydata_google_auth google-auth-oauthlib
import pydata_google_auth
import gcsfs

SCOPES = [
# taken from gcsfs, and not verified with pydata auth lib
  "https://www.googleapis.com/auth/devstorage.full_control"
]

token_path = "~/auth/creds"
pydata_google_auth.save_user_credentials(
    SCOPES,
    path=token_path
)

db_path = f"gs://{BUCKET}/{DIR}/test.json"


fs = gcsfs.GCSFileSystem(token=token_path)

with fs.open(db_path, mode='w') as f:
  f.write('aoeu')

with fs.open(db_path, mode='r') as f:
  print(f.read())

@max-sixty
Copy link
Contributor

FWIW tokens generated from this also seem to work: https://developers.google.com/oauthplayground/

@max-sixty
Copy link
Contributor

max-sixty commented Jul 23, 2020

Updated to avoid writing anything to disk. It's inconvenient that the pydata_google_auth library doesn't output a dict:

def get_creds_hack():
  import pydata_google_auth
  import gcsfs
  import json

  SCOPES = ["https://www.googleapis.com/auth/devstorage.full_control"]
  
  creds = pydata_google_auth.get_user_credentials(SCOPES)

  creds_dict = json.loads(creds.to_json())
  return gcsfs.GCSFileSystem(token=creds_dict)

@max-sixty
Copy link
Contributor

max-sixty commented Jul 29, 2020

Unless I'm missing something this is actually ever simpler, given colab support application-default creds:

def get_gcsfs():
    import gcsfs
    import google

    google.colab.auth.authenticate_user()
    creds, project = google.auth.default()

    return gcsfs.GCSFileSystem(token=creds)

...though I'm not sure this will work for Google Drive, if that's a use case some ppl are working to.

@martindurant
Copy link
Member

That would require the user to have gcloud installed, which I think is asking a bit much. Also, it doesn't seem right to reset the user's default google identity, since they might need that for other (e.g., CLI) uses.

@max-sixty
Copy link
Contributor

Yes good point @martindurant, this does require gcloud. That works on Colab, at least.

I think without gcloud the current options are:

  • Do the hack above using pydata-auth (which requires clicking through a few "warning" pages)
  • GCSFS applies for its own OAuth ID with the correct scopes, so there aren't any "warning" pages

@martindurant
Copy link
Member

GCSFS applies for its own OAuth ID with the correct scopes

I tried this process and failed, because gcsfs does not have a real URL redirect that can be verified. If you can help with the process, then we wouldn't need to change the code at all.

For the record, I did not see any warning when using pydata-auth, and it seems to be working for gdrivefs. That use case is actually the more important one, because users with GCS storage do have google GCP identities, but this is not necessarily true for gdrive users.

@max-sixty
Copy link
Contributor

GCSFS applies for its own OAuth ID with the correct scopes

I tried this process and failed, because gcsfs does not have a real URL redirect that can be verified. If you can help with the process, then we wouldn't need to change the code at all.

@tswast could I ask how you got one for pydata-auth? Is there any way for gcsfs to go through the same window?

For the record, I did not see any warning when using pydata-auth, and it seems to be working for gdrivefs. That use case is actually the more important one, because users with GCS storage do have google GCP identities, but this is not necessarily true for gdrive users.

Great, potentially because pydata-auth has google drive scope already.

@tswast
Copy link

tswast commented Jul 29, 2020

could I ask how you got one for pydata-auth? Is there any way for gcsfs to go through the same window?

At the time the verification process was a little bit different. I was able to verify https://pydata-google-auth.readthedocs.io/ as the domain. They've since changed the rules to require a top-level domain where you control the DNS attributes, so I wouldn't be able to use the same process without buying a domain like pydata-google-auth.dev or something

@rabernat
Copy link
Contributor Author

rabernat commented Oct 13, 2021

This issue has gone stale, but it's still a problem. The workaround in #261 (comment) worked for me, provided I changed SCOPES to

SCOPES = ["https://www.googleapis.com/auth/cloud-platform"]

I propose that we change gcsfs to dispatch to pydata_google_auth when the user specifies token='browser'.

@martindurant
Copy link
Member

I propose that we change gcsfs to dispatch to pydata_google_auth when the user specifies token='browser'.

I don't see why not, if our own system simply doesn't work. It would look the same as gdrive's? On that point, iterative's pydrive should be a fully-featured replacement at some point. It also has an auth hook, but that requires users to first download a token file from the google API console.

@tswast
Copy link

tswast commented Mar 14, 2022

Note: I see that pydrive is using oauth2client, which has been deprecated for about 5 years now. Probably not best to migrate to their way of doing auth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants