Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> #475

Closed
3 tasks done
matteoannotell opened this issue Jan 23, 2023 · 9 comments
Closed
3 tasks done

Comments

@matteoannotell
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When several hundreds of models gets executed in a single dbt build, it sometimes happens that model get created successfully, but some of the tests might fail.
In this case, it is nowadays very time consuming to look for the job_id and execution details, as it is not explicitly outputted unless in --debug mode, which would unnecessarily clutter the logs. This gets particularly challenging when the dbt build is triggered in some sort of automated process, such as a scheduled dbt execution with airflow running in own kubernetes environment.

Describe alternatives you've considered

Alternatives considered so far are:

  • when running locally, use --debug mode
  • when running scheduled tasks in airflow, export run_results.json
  • manually patch core/adapter so that the link to job_id is always shown

Neither of those options seems however maintainable or offer a smooth developer experience where all information is condensed in the log shown in the console

Who will this benefit?

The main benefit is for anyone running/building multiple models at once who has the need to trace back a particular execution to its counterpart in bigquery. The benefit becomes evident as soon as the number of models increase

Are you interested in contributing this feature?

I would love to help, but I might need a couple of pointers regarding best practices, etc.

Anything else?

No response

@github-actions github-actions bot changed the title [Feature] <Optionally provide job_id for every model that gets executed> [CT-1878] [Feature] <Optionally provide job_id for every model that gets executed> Jan 23, 2023
@github-christophe-oudar
Copy link
Contributor

Thanks for creating the issue!
Trying to sum up your need: you'd like a way to quickly identify the dbt model <-> BQ job id both in dev or prod environments?
Is showing the BQ link and/or job id without the rest of the "debug bloat" what you'd suggest?

@matteoannotell
Copy link
Author

Yes Christophe, something like that. A full --debug run gets too much verbose and difficult to navigate, what I would suggest here is to have the option to always show the bq job_id (or a full link to it) before each console line where the job results are summarised.

@github-christophe-oudar
Copy link
Contributor

It makes sense to improve logging and I think it could be useful, I'm curious if @jtcohen6 has a higher level plan for that.
Structuring log levels and deciding what/when to show logs could be an approach to consider on the core to let people customize it to fit their needs like you.
Maybe it's just a simple tweak but since current solution doesn't fit for your use cases, I think we might see more usages that requires a customizable solution. I don't know how other adapters are displaying that kind of information so dbt team could bring some interesting insights on this topic.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 24, 2023

Quick thoughts:

I do think those are the right foundational mechanisms to have in place, as far as making this information available / programmatically parseable.

Open to hearing your thoughts about preferred UX, though! Should we include the BQ job link for test failures, same as we do for query errors?

if hasattr(error, "query_job"):
logger.error(
cls._bq_job_link(
error.query_job.location, error.query_job.project, error.query_job.job_id
)
)
raise DbtDatabaseError(error_msg)

@github-christophe-oudar
Copy link
Contributor

I agree on the fundamentals!
I wonder if we should provide some kind of macros to let people tweak logs instead of hardcoded Python code such as

logger.error( 
         cls._bq_job_link( 
             error.query_job.location, error.query_job.project, error.query_job.job_id 
         ) 
     ) 

?
I didn't notice we didn't have job links.
I'm using https://www.elementary-data.com/ for alerting on dbt tests and I think it would make sense to have links there so the adapter_response on tests should enable bringing a job link easily.

@jtcohen6
Copy link
Contributor

I wonder if we should provide some kind of macros to let people tweak logs instead of hardcoded Python code such as

Hmm, I don't know that we should make this a documented & supported pattern... but technically, this is already possible. If I add some code like this into a custom version of the test materialization:

...
  {% call statement('main', fetch_result=True) -%}

    {{ get_test_sql(main_sql, fail_calc, warn_if, error_if, limit)}}

  {%- endcall %}

  ---- my custom code
  {% set result = load_result('main') %}
  {% set should_warn_or_error = result.table.columns[1][0] or result.table.columns[2][0] %}
  {% if should_warn_or_error %}
      {{ log("Job ID for test that found failures: " ~ result.response.job_id, info = true) }}
  {% endif %}
  ----
  
  {{ return({'relations': relations}) }}
...

Voila:

$ dbt build
...
11:14:44  1 of 3 START sql view model dbt_jcohen.my_model ................................ [RUN]
11:14:45  1 of 3 OK created sql view model dbt_jcohen.my_model ........................... [CREATE VIEW (0 processed) in 1.23s]
11:14:45  2 of 3 START test not_null_my_model_id ......................................... [RUN]
11:14:45  3 of 3 START test unique_my_model_id ........................................... [RUN]
11:14:47  2 of 3 PASS not_null_my_model_id ............................................... [PASS in 1.53s]
11:14:47  Job ID for test that found failures: 4a02e6aa-dec5-4269-86dc-7ceb67295db7
11:14:47  3 of 3 FAIL 1 unique_my_model_id ............................................... [FAIL 1 in 1.58s]
...

@github-christophe-oudar
Copy link
Contributor

github-christophe-oudar commented Jan 25, 2023

Absolutely!
I meant without rewriting a dedicated materialization. That's why I thought you would be a good input to see if there's some product vision. In that case, a dedicated discussion would be a better fit to provide a direction toward building a lasting solution.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 25, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants