[CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> #475

matteoannotell · 2023-01-23T08:16:54Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When several hundreds of models gets executed in a single dbt build, it sometimes happens that model get created successfully, but some of the tests might fail.
In this case, it is nowadays very time consuming to look for the job_id and execution details, as it is not explicitly outputted unless in --debug mode, which would unnecessarily clutter the logs. This gets particularly challenging when the dbt build is triggered in some sort of automated process, such as a scheduled dbt execution with airflow running in own kubernetes environment.

Describe alternatives you've considered

Alternatives considered so far are:

when running locally, use --debug mode
when running scheduled tasks in airflow, export run_results.json
manually patch core/adapter so that the link to job_id is always shown

Neither of those options seems however maintainable or offer a smooth developer experience where all information is condensed in the log shown in the console

Who will this benefit?

The main benefit is for anyone running/building multiple models at once who has the need to trace back a particular execution to its counterpart in bigquery. The benefit becomes evident as soon as the number of models increase

Are you interested in contributing this feature?

I would love to help, but I might need a couple of pointers regarding best practices, etc.

Anything else?

No response

The text was updated successfully, but these errors were encountered:

github-christophe-oudar · 2023-01-23T18:51:12Z

Thanks for creating the issue!
Trying to sum up your need: you'd like a way to quickly identify the dbt model <-> BQ job id both in dev or prod environments?
Is showing the BQ link and/or job id without the rest of the "debug bloat" what you'd suggest?

matteoannotell · 2023-01-24T09:26:20Z

Yes Christophe, something like that. A full --debug run gets too much verbose and difficult to navigate, what I would suggest here is to have the option to always show the bq job_id (or a full link to it) before each console line where the job results are summarised.

github-christophe-oudar · 2023-01-24T10:01:49Z

It makes sense to improve logging and I think it could be useful, I'm curious if @jtcohen6 has a higher level plan for that.
Structuring log levels and deciding what/when to show logs could be an approach to consider on the core to let people customize it to fit their needs like you.
Maybe it's just a simple tweak but since current solution doesn't fit for your use cases, I think we might see more usages that requires a customizable solution. I don't know how other adapters are displaying that kind of information so dbt team could bring some interesting insights on this topic.

jtcohen6 · 2023-01-24T10:40:55Z

Quick thoughts:

job_id is already included in adapter_response when a query completes
adapter_response should be included in both structured logging (available in real time) + run_results.json (static artifact, available once invocation completes)
Tests don't include adapter_response in their results for a very silly reason ([CT-1635] Populate adapter_response for tests + sources dbt-core#2964) — and there's a recent community contribution proposing to fix it! add adapter_response for test dbt-core#6645

I do think those are the right foundational mechanisms to have in place, as far as making this information available / programmatically parseable.

Open to hearing your thoughts about preferred UX, though! Should we include the BQ job link for test failures, same as we do for query errors?

dbt-bigquery/dbt/adapters/bigquery/connections.py

Lines 193 to 199 in 3ce88d7

    
           if hasattr(error, "query_job"): 
        
               logger.error( 
        
                   cls._bq_job_link( 
        
                       error.query_job.location, error.query_job.project, error.query_job.job_id 
        
                   ) 
        
               ) 
        
           raise DbtDatabaseError(error_msg)

github-christophe-oudar · 2023-01-24T10:50:47Z

I agree on the fundamentals!
I wonder if we should provide some kind of macros to let people tweak logs instead of hardcoded Python code such as

logger.error( 
         cls._bq_job_link( 
             error.query_job.location, error.query_job.project, error.query_job.job_id 
         ) 
     )

?
I didn't notice we didn't have job links.
I'm using https://www.elementary-data.com/ for alerting on dbt tests and I think it would make sense to have links there so the adapter_response on tests should enable bringing a job link easily.

jtcohen6 · 2023-01-24T11:17:40Z

I wonder if we should provide some kind of macros to let people tweak logs instead of hardcoded Python code such as

Hmm, I don't know that we should make this a documented & supported pattern... but technically, this is already possible. If I add some code like this into a custom version of the test materialization:

...
  {% call statement('main', fetch_result=True) -%}

    {{ get_test_sql(main_sql, fail_calc, warn_if, error_if, limit)}}

  {%- endcall %}

  ---- my custom code
  {% set result = load_result('main') %}
  {% set should_warn_or_error = result.table.columns[1][0] or result.table.columns[2][0] %}
  {% if should_warn_or_error %}
      {{ log("Job ID for test that found failures: " ~ result.response.job_id, info = true) }}
  {% endif %}
  ----
  
  {{ return({'relations': relations}) }}
...

Voila:

$ dbt build
...
11:14:44  1 of 3 START sql view model dbt_jcohen.my_model ................................ [RUN]
11:14:45  1 of 3 OK created sql view model dbt_jcohen.my_model ........................... [CREATE VIEW (0 processed) in 1.23s]
11:14:45  2 of 3 START test not_null_my_model_id ......................................... [RUN]
11:14:45  3 of 3 START test unique_my_model_id ........................................... [RUN]
11:14:47  2 of 3 PASS not_null_my_model_id ............................................... [PASS in 1.53s]
11:14:47  Job ID for test that found failures: 4a02e6aa-dec5-4269-86dc-7ceb67295db7
11:14:47  3 of 3 FAIL 1 unique_my_model_id ............................................... [FAIL 1 in 1.58s]
...

github-christophe-oudar · 2023-01-25T12:08:04Z

Absolutely!
I meant without rewriting a dedicated materialization. That's why I thought you would be a good input to see if there's some product vision. In that case, a dedicated discussion would be a better fit to provide a direction toward building a lasting solution.

github-actions · 2023-07-25T02:04:06Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions · 2023-08-02T01:50:47Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

matteoannotell added type:enhancement New feature or request triage:product labels Jan 23, 2023

github-actions bot changed the title ~~[Feature] <Optionally provide job_id for every model that gets executed>~~ [CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> Jan 23, 2023

jtcohen6 added triage:awaiting-response and removed triage:product labels Jan 24, 2023

github-actions bot added the Stale label Jul 25, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> #475

[CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> #475

matteoannotell commented Jan 23, 2023

github-christophe-oudar commented Jan 23, 2023

matteoannotell commented Jan 24, 2023

github-christophe-oudar commented Jan 24, 2023

jtcohen6 commented Jan 24, 2023 •

edited

Loading

github-christophe-oudar commented Jan 24, 2023

jtcohen6 commented Jan 24, 2023

github-christophe-oudar commented Jan 25, 2023 •

edited

Loading

github-actions bot commented Jul 25, 2023

github-actions bot commented Aug 2, 2023

[CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> #475

[CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> #475

Comments

matteoannotell commented Jan 23, 2023

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

github-christophe-oudar commented Jan 23, 2023

matteoannotell commented Jan 24, 2023

github-christophe-oudar commented Jan 24, 2023

jtcohen6 commented Jan 24, 2023 • edited Loading

github-christophe-oudar commented Jan 24, 2023

jtcohen6 commented Jan 24, 2023

github-christophe-oudar commented Jan 25, 2023 • edited Loading

github-actions bot commented Jul 25, 2023

github-actions bot commented Aug 2, 2023

jtcohen6 commented Jan 24, 2023 •

edited

Loading

github-christophe-oudar commented Jan 25, 2023 •

edited

Loading