-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding BigQuery authentication to credentials.yml #1621
Comments
BTW this is my temp fix:
|
Hello @jacobweiss2305. If you want to provide credentials for this dataset using credentials.yml then you need to effectively copy and paste what's in your creds.json file into there, e.g.
This dictionary gets passed directly into If you want to continue using your .ssh/creds.json file then you'll need to make a custom dataset that does that, but it should be very easy to do so:
|
@AntonyMilneQB , thank you so much, this is very helpful. Does the credentials.yml need a specific format? The creds.json file looks:
I translated that into the credentials.yml file:
But I get this error:
I also tried this in the credentials.yml:
But received this error:
Which doesn't make sense because I can get it to work through environment variables. Do you see any obvious error I am committing or recommend any further tests? |
Hmm, this is a bit confusing. I feel like the 1st unnested way of doing things should be correct, but clearly the 2nd is actually the correct format 🤔 But the fact that also doesn't work is weird. What the kedro dataset is doing here is actually pretty simple: kedro/kedro/extras/datasets/pandas/gbq_dataset.py Lines 262 to 270 in 15d47c7
where
Hopefully playing around with that will shed some light on what's going on. e.g. if the json_credentials also don't work but the |
I also noticed that kedro kedro.extras.datasets.pandas.GBQQueryDataSet is not using the service_account. I was able to get this to work:
|
The other test I ran is reauth in catalog.yml:
But it forces you to go to the web and copy/paste an authentication code (this works to load in data). I think that the fix is to add service_account.Credentials.from_service_account_file to kedro.extras.datasets.pandas.GBQQueryDataSet. Would you agree? |
I'm not sure. I don't understand much (anything) about Google credentials but as far as I can tell there's two different questions here:
For the 2nd of these, it seems that both of these can be instantiated in 3 different ways:
"from a file" feels like not such a kedro way to do it, but I think one of the first two options should work correctly with credentials.yml. |
I think it's worth playing around with the script I showed in #1621 (comment) to understand what the correct way of loading a dictionary of credentials would be. e.g. if you use the "from a dictionary" option, does it work with Overall I think once we've figured out the right way to get this working outside kedro, modifying the kedro dataset to match should be straightforward. But it's potentially a breaking change also. |
I'm closing this issue since there hasn't been any recent activity. Feel free to re-open this if you're still facing problems! |
For the record, I implemented my own BigQuery dataset and this is how it looks: class PolarsBigQueryDataset(AbstractDataset):
def __init__(self, sql: str, credentials: dict[str, t.Any] | None = None):
self._sql = sql
self._client = bigquery.Client.from_service_account_info(credentials) so that the configs can look like this: # catalog.yml
pypi_kedro:
type: kedro_pypi_monitor.datasets.PolarsBigQueryDataset
sql: ...
credentials: gbq_credentials
# credentials.yml
gbq_credentials:
type: service_account
project_id: kedro-pypi-stats
private_key_id: ...
private_key: "-----BEGIN PRIVATE KEY-----\n...
client_email: ...
... Less than optimal but a good workaround. File-based credentials in Kedro aren't really developed at the moment. |
Description
How do I authenticate using credentials.yml for pandas.GBQQueryDataSet?
I am trying to add big query authentication to credentials.yml file, but I can't load in data from big query. The documentation says I need to add a object or dictionary, but unsure what that looks like in credentials.yml.
Does anyone have an example of how to do this?
Steps to Reproduce
The kedro documentation says:
Here is what I tried in credentials.yml:
gbq-creds: "~/.ssh/creds.json"
And in the corresponding catalog.yml:
But here is the error:
Your Environment
The text was updated successfully, but these errors were encountered: