Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery adapter should record in a label the dbt invocation_id #2808

Closed
mescanne opened this issue Oct 2, 2020 · 3 comments · Fixed by #2809
Closed

BigQuery adapter should record in a label the dbt invocation_id #2808

mescanne opened this issue Oct 2, 2020 · 3 comments · Fixed by #2809
Labels
bigquery enhancement New feature or request

Comments

@mescanne
Copy link
Contributor

mescanne commented Oct 2, 2020

Describe the feature

Performance analysis of BigQuery DBT invocations can be made with ease with a small feature within adapters/bigquery/connections.py. This is needed to record the dbt invocation_id as part of the job as a label.

Describe alternatives you've considered

Capturing statistics out of DBT is one alternative. There is no straight-forward path here.

Additional context

Within adapters/bigquery/connections.py within raw_execute there is a job_params being initialized. The job_params should have a labels dictionary added with, for example, "dbt_invocation_id" being set to dbt.tracking.active_user.invocation_id.

The API being used is here:
https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJobConfig.html#google.cloud.bigquery.job.QueryJobConfig

Using the INFORMATION_SCHEMA labels field can be used to extract for the unique invocation_id. This can give detailed information on performance of queries:
https://cloud.google.com/bigquery/docs/information-schema-jobs

The invocation_id could be used write to a logging table if logical information about the run should be recorded.

Who will this benefit?

This will benefit any organization using BigQuery with DBT who want to do systematic performance profiling.

Are you interested in contributing this feature?

Yes, but I am in the process of getting the CLA reviewed on my side.

@mescanne mescanne added enhancement New feature or request triage labels Oct 2, 2020
@jtcohen6 jtcohen6 removed the triage label Oct 2, 2020
@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 2, 2020

Hey @mescanne, I can do you one better—this is totally possible today with just a little bit of configuration, using the BigQuery labels config and the invocation_id available in the Jinja context:

models:
  my_project:
    +labels:
      invocation_id: "{{ invocation_id }}"

I'm going to close the issue, though I'd encourage you comment here if that approach isn't quite what you're after.

@jtcohen6 jtcohen6 closed this as completed Oct 2, 2020
@mescanne
Copy link
Contributor Author

mescanne commented Oct 2, 2020

Ah, no, that labels the tables and views. My proposal is to label the jobs themselves. This way you can track the invocation with the SQL jobs that you used for the invocation. It would make sense to always track the SQL jobs by the invocation, which is why I was proposing to do it always.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Oct 2, 2020

Oh, I see! Sorry for my misunderstanding earlier.

In that case, I believe this issue is a duplicate of #2483, which is to add support for user-configurable labels and tags on jobs (similar to how it works for tables and views). That said, I think it would be reasonable to include, by default, the same information in the query-comment dict as job labels, which includes invocation_id.

@jtcohen6 jtcohen6 added the duplicate This issue or pull request already exists label Oct 2, 2020
@jtcohen6 jtcohen6 reopened this Oct 6, 2020
@jtcohen6 jtcohen6 added bigquery and removed duplicate This issue or pull request already exists labels Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants