Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create bigquery dataset if it already exists #1569

Merged

Conversation

tsotnet
Copy link
Collaborator

@tsotnet tsotnet commented May 20, 2021

Signed-off-by: Tsotne Tabidze tsotne@tecton.ai

What this PR does / why we need it: In bigquery.py, first call client.get_dataset and only call client.create_dataset if it doesn't exist. This way users can manually create the dataset and only grant bigquery.datasets.get permission (not bigquery.datasets.create) to Feast.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Require just `bigquery.datasets.get` permission (not `bigquery.datasets.create`) if the BigQuery dataset defined by `offline_store->dataset` is already created by the user.

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
@jklegar
Copy link
Collaborator

jklegar commented May 20, 2021

/lgtm

@@ -102,8 +102,15 @@ def get_historical_features(

assert isinstance(config.offline_store, BigQueryOfflineStoreConfig)

# We should create a new dataset if the dataset name was not overridden in the config
should_create_dataset = "dataset" not in config.offline_store.__fields_set__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't 100% agree with this approach. What if the dataset already exists but its not set in the config? I think we should just manually check whether the dataset exists before we run the create command.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll require an additional bigquery.datasets.get permission on the dataset in question

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically we'll be saying that you need either bigquery.datasets.create and bigquery.datasets.get permissions if the dataset does not exist, or just bigquery.datasets.get permission if it already exists.

Copy link
Member

@woop woop May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So get permissions is too much? It seems reasonable to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, its probably possible for us to get around the bigquery.datasets.get permission by trying an operation inside the dataset and seeing if it succeeds (and which exception is thrown).

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
@feast-ci-bot feast-ci-bot removed the lgtm label May 20, 2021
Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
@codecov-commenter
Copy link

codecov-commenter commented May 20, 2021

Codecov Report

Merging #1569 (43f4c08) into master (88e693a) will increase coverage by 6.15%.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1569      +/-   ##
==========================================
+ Coverage   77.40%   83.55%   +6.15%     
==========================================
  Files          64       65       +1     
  Lines        5624     5754     +130     
==========================================
+ Hits         4353     4808     +455     
+ Misses       1271      946     -325     
Flag Coverage Δ
integrationtests 83.47% <60.00%> (?)
unittests 77.36% <20.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/infra/offline_stores/bigquery.py 91.03% <60.00%> (+58.41%) ⬆️
sdk/python/feast/infra/gcp.py 82.53% <0.00%> (ø)
sdk/python/feast/feature_store.py 91.89% <0.00%> (+0.45%) ⬆️
sdk/python/feast/repo_config.py 95.23% <0.00%> (+0.95%) ⬆️
sdk/python/feast/infra/provider.py 83.62% <0.00%> (+2.58%) ⬆️
sdk/python/feast/type_map.py 41.72% <0.00%> (+2.87%) ⬆️
sdk/python/tests/conftest.py 100.00% <0.00%> (+5.88%) ⬆️
sdk/python/feast/feature_view.py 87.75% <0.00%> (+6.12%) ⬆️
sdk/python/feast/data_source.py 78.84% <0.00%> (+10.89%) ⬆️
sdk/python/feast/registry.py 79.11% <0.00%> (+19.55%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88e693a...43f4c08. Read the comment docs.

client.get_dataset(dataset)
except NotFound:
# Only create the dataset if it does not exist
client.create_dataset(dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we removed exists_ok=True? Wouldn't we introduce a race condition here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove it because the except block would only be executed if the dataset doesn't exist. I didn't think about race conditions. I'm adding this back.

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tsotnet, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented May 20, 2021

/lgtm

@tsotnet tsotnet changed the title Don't create bigquery dataset if dataset field is defined in config Don't create bigquery dataset if it already exists May 20, 2021
@feast-ci-bot feast-ci-bot merged commit 14e4474 into feast-dev:master May 20, 2021
jklegar pushed a commit that referenced this pull request May 27, 2021
* Don't create bigquery dataset if dataset field is defined in config

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Try getting the dataset before creating it

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Remove extra client.get_dataset

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>

* Add exists_ok=True back

Signed-off-by: Tsotne Tabidze <tsotne@tecton.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants