Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create bigquery dataset if it already exists #1569

Merged
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions sdk/python/feast/infra/offline_stores/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from feast.repo_config import BigQueryOfflineStoreConfig, RepoConfig

try:
from google.api_core.exceptions import NotFound
from google.auth.exceptions import DefaultCredentialsError
from google.cloud import bigquery

Expand Down Expand Up @@ -103,7 +104,7 @@ def get_historical_features(
assert isinstance(config.offline_store, BigQueryOfflineStoreConfig)

table_id = _upload_entity_df_into_bigquery(
config.project, config.offline_store.dataset, entity_df, client,
config.project, config.offline_store.dataset, entity_df, client
)
entity_df_sql_table = f"`{table_id}`"
else:
Expand Down Expand Up @@ -206,9 +207,12 @@ def _upload_entity_df_into_bigquery(project, dataset_name, entity_df, client) ->
# First create the BigQuery dataset if it doesn't exist
dataset = bigquery.Dataset(f"{client.project}.{dataset_name}")
dataset.location = "US"
client.create_dataset(
dataset, exists_ok=True
) # TODO: Consider moving this to apply or BigQueryOfflineStore

try:
client.get_dataset(dataset)
except NotFound:
# Only create the dataset if it does not exist
client.create_dataset(dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we removed exists_ok=True? Wouldn't we introduce a race condition here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove it because the except block would only be executed if the dataset doesn't exist. I didn't think about race conditions. I'm adding this back.


# Drop the index so that we dont have unnecessary columns
entity_df.reset_index(drop=True, inplace=True)
Expand Down