Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery profiling ingests great-expectations temp views #5210

Closed
maaaikoool opened this issue Jun 21, 2022 · 11 comments · Fixed by #5826
Closed

BigQuery profiling ingests great-expectations temp views #5210

maaaikoool opened this issue Jun 21, 2022 · 11 comments · Fixed by #5826
Assignees
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata

Comments

@maaaikoool
Copy link
Contributor

Describe the bug
Running the bigquery ingestion with profiling enabled results in the ingestion of thousands of temp views used internally by great-expectations e.g. ge-temp-{uuid}

@aditya-radhakrishnan
Copy link
Contributor

Hi @maaaikoool can you share what your recipe looks like, and which version of the CLI you're on?

cc @treff7es

@maaaikoool
Copy link
Contributor Author

Sure!
We are using linkedin/datahub-ingestion:v0.8.36 docker image.
The config is:

source:
  type: bigquery
  config:
    include_table_lineage: true
    profiling:
      enabled: true
      profile_table_level_only: true
    project_id: foo
    schema_pattern:
      allow:
      - bar
      - baz
    use_date_sharded_audit_log_tables: true
    use_v2_audit_metadata: true
sink:
  config:
    server: http://gms:8080
  type: datahub-rest

@maaaikoool
Copy link
Contributor Author

maaaikoool commented Jun 24, 2022

Seems like the problem is that the views are not deleted after the ingestion finishes:

  • I cleaned up everything (both from bigquery and datahub)
  • Modified the ingestion to add the bigquery_temp_table_schema option to a temp dataset.
  • After the ingestion ended successfully I found again lots of newly created ge-temp-xxx views in the temp dataset

@treff7es
Copy link
Contributor

Hmm, I need to check what is happening as it should clean up the temp tables

@treff7es
Copy link
Contributor

I just tested on my side and for me, it dropped all the created views but of course, this doesn't mean there isn't a bug.
Which version of datahub cli are you on?
Can you see any message like this in your logs:
[2022-06-24 17:22:34,932] INFO {sqlalchemy.engine.base.Engine:110} - drop view if exists calm-pagoda-323403.ge_temp_views.ge-temp-ce44e526-4895-47bd-a465-357e95fd6bea
Or can you see message like this:

Unable to delete bigquery temporary table:

Is it possible your run got killed?

@maaaikoool
Copy link
Contributor Author

Many thanks for the follow-up @treff7es !

Which version of datahub cli are you on?

0.8.36+docker

Is it possible your run got killed?

It finishes successfully.

Or can you see message like this: Unable to delete bigquery temporary table:

Bingo! We do have the Unable to delete biquery temporary table warnings. I failed to spot those before, my bad!

Before this I had an issue cause I was missing the bigquery.tables.update permission too. However that resulted in a failed run which made it easier to spot the problem. Maybe we could fail the execution, or use error logs, WDYT?

After double checking the permissions with the docs I realized that some new permissions were added here and we were missing those. However that did not fix the issue 🤔

Anyway now that I know the root cause I can follow up on that 👍 I will close the issue and reopen or comment if I have any relevant updates. Thanks again!

@maaaikoool
Copy link
Contributor Author

Are we missing the bigquery.tables.delete permission from the listed here?

If I mount a service account with the permissions listed there and try to replicate the query we do here it fails:

bq --project_id foo query --use_legacy_sql=false 'drop view if exists `foo`.temp.test_david'
BigQuery error in query operation: Error processing job 'foo:bqjob_r722c146ddac2c343_00000181ab43fa6b_1': Access Denied: Table foo:temp.test_david: Permission
bigquery.tables.delete denied on table foo:temp.test_david (or it may not exist).

@maaaikoool maaaikoool reopened this Jul 21, 2022
@maaaikoool
Copy link
Contributor Author

Hey @treff7es did you have a chance to review if we should have bigquery.tables.delete? Many thanks

@anshbansal anshbansal added the ingestion PR or Issue related to the ingestion of metadata label Jul 26, 2022
@treff7es
Copy link
Contributor

I think you need to grant delete access on those schemas where bigquery source creates temporary tables.
If you specify a schema dedicated (bigquery_temp_table_schema property in profiling) to these temporary tables, then it is enough if you grant delete permission only to that schema.

@siddiquebagwan
Copy link
Contributor

Hi @maaaikoool
Is this still an issue? if not, will close it after a few days of inactivity.

@maaaikoool
Copy link
Contributor Author

Hi!

Sorry but I've been AFK for a while 😅 I've added the permission to the list in #5826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants