Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output run hooks (#696) #1440

Merged
merged 7 commits into from
May 10, 2019
Merged

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented May 3, 2019

Fixes #696

Output information about the on-run-start and on-run-end hooks before and after running dbt.

One wrinkle, the output isn't exactly as described in the issue - it's a bit tricky to move that Concurrency line before hooks run. Instead you get this order.

Running with dbt=0.13.0
Found 163 macros, 0 analyses, 0 archives, 0 sources, 0 seed files, 0 tests, 2 operations, 1 models

07:58:48 |
07:58:48 | Running 1 on-run-start hook
07:58:48 | 1 of 1 START select * from information_schema.schemata
07:58:48 |
07:58:48 | Concurrency: 2 threads (target='default')
07:58:48 |
07:58:48 | 1 of 1 START view model dbt_postgres.x............................... [RUN]
07:58:48 | 1 of 1 OK created view model dbt_postgres.x.......................... [CREATE VIEW in 0.08s]
07:58:48 |
07:58:48 | Running 1 on-run-end hook
07:58:48 | 1 of 1 START select * from information_schema.schemata
07:58:48 |
07:58:48 |
07:58:48 | Finished running 2 hooks, 1 view models in 0.45s.

Completed successfully

Done. PASS=1 ERROR=0 SKIP=0 TOTAL=1

We definitely can move that Concurrency line up, and the output would probably look a bit nicer, but it involves moving that into the task level instead of the node runner level and rearranging some things.

@drewbanin drewbanin self-requested a review May 6, 2019 20:40
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about the hook output, let me know if you'd like to discuss!

I'm actually ok with the Concurrency .... line where it is -- the on-run-start|end hooks are run serially, so it actually doesn't make a ton of sense to show the Concurrency line before that.

Do we even need this line anymore? I think we added it when concurrency was new to dbt, and we wanted to make sure everyone knew about it :)

Let me think a little harder about what, if anything, we should be showing here... Maybe it's just Running 7 models or something like that instead?

compiled = compile_node(adapter, self.config, hook,
self.manifest, extra_context)
statement = compiled.wrapped_sql
dbt.ui.printer.print_timestamped_line(
'{} of {} START {}'.format(idx, len(ordered_hooks), statement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we format this like all of the other timestamped lines that dbt prints?

07:58:48 | 1 of 2 START hook: grant usage on schema......... [RUN]
07:58:48 | 1 of 2 OK hook: grant usage on schema......... [GRANT in 0.4s]
07:58:48 | 2 of 2 START hook: grant select on ......... [RUN]
07:58:48 | 2 of 2 OK hook: grant select on......... [GRANT in 2.8s]

I think we'll also want to reshape/truncate the hook SQL. It's not uncommon to do operational things (like create UDFs, or insert into an audit table in these hooks, and they currently take over the stdout!

I imagine we can do something like (pseudocode):

hook_text = hook.replace("\n", " ")[0:60] + "..."

This doesn't need to be the exact same SQL that dbt is actually running, it just needs to be a stand-in for a name! Maybe some day it will make sense to name hooks, but today is not that day :)

Let's also include a line for when the hooks completes, and a status if it's easy to grab. I think a lot of folks don't actually have insight into how long their hooks currently take, and I be it will be helpful to expose this info to them!

Here's an example hook for testing:

on-run-start:
 - "{{ load_udfs() }}"
-- macros/udfs.sql (imagine this did something useful....)

CREATE OR REPLACE FUNCTION no_op()
RETURNS void AS $$
$$ LANGUAGE SQL;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also be pretty aggressive with the truncation here -- the point is just to show that hooks are running, and to indicate how long they have taken. If folks want to dig further into that, they can find the full queries in the logs!

Truncate long run-hooks, replacing newlines with spaces
Include run-hook status/timing in output
Clean up run-hook execution a bit
@beckjake
Copy link
Contributor Author

beckjake commented May 7, 2019

Do we even need this line anymore?

I find it useful sometimes, though I don't know if it needs to go to stdout.

@beckjake beckjake requested a review from drewbanin May 7, 2019 15:56
@drewbanin
Copy link
Contributor

drewbanin commented May 9, 2019

I'm unsure why this happens, but it looks like the statuses in stdout aren't colored correctly (this is on-run-start):
Screen Shot 2019-05-09 at 10 52 34 AM

I'd expect the SUCCESS 1 strings to be colored green here. Basically if an exception occurs, this should be red, otherwise it should be green.

@drewbanin
Copy link
Contributor

In actually seeing this, I think the output is just too verbose to be really useful here. I knew it would be a lot of text, but I'm not thrilled with the way this actually looks. It's the -- Start a transaction. This is required to per... in particular line that's unappealing: If every hook was just grant usage on schema dbt_dbanin to role reporting then I'd be happy, but that's certainly not what 100% of hooks look like these days.

Let's plan to "name" hooks at some point in the future. That might look like:

on-run-end:
  - {name: "reporting grant", sql: "grant usage on schema {{ target.schema }} to role reporter}

For now though, I think it would be a good idea to just output the package, hook type, and hook index. So:

10:57:53 | Running 2 on-run-end hooks
10:57:53 | 1 of 2 START fishtown_internal_analytics.on_run_start.1.......................... [RUN]
10:57:53 | 1 of 2 OK fishtown_internal_analytics.on_run_start.1............................. [GRANT in 0.00s]
10:57:53 | 2 of 2 START fishtown_internal_analytics.on_run_start.2.......................... [RUN]
10:57:53 | 2 of 2 OK fishtown_internal_analytics.on_run_start.2............................. [GRANT in 0.00s]

The actual formatting of fishtown_internal_analytics.on_run_start.2 isn't so important -- if it's easier to do fishtown_internal_analytics.on_run_start-2 or something, that works just as well!

@beckjake
Copy link
Contributor Author

beckjake commented May 9, 2019

I'd expect the SUCCESS 1 strings to be colored green here. Basically if an exception occurs, this should be red, otherwise it should be green.

I'm mostly amenable to this, but worth pointing out that it will never be red - failing a hook results in the run failing entirely and halting.

$ cat dbt_project.yml
name: 'minimal'
version: '1.0'
profile: 'local'

on-run-start:
 - 'select id from does_not_exist'
$ dbt run
Running with dbt=0.13.0
Found 1 models, 0 tests, 0 archives, 0 analyses, 101 macros, 1 operations, 0 seed files, 0 sources

10:45:49 |
10:45:49 | Running 1 on-run-start hook
10:45:49 | 1 of 1 START hook: select id from does_not_exist..................... [RUN]
Database error while running on-run-start
Encountered an error:
Database Error
  relation "does_not_exist" does not exist
  LINE 1: select id from does_not_exist
                         ^

@drewbanin
Copy link
Contributor

I'm mostly amenable to this, but worth pointing out that it will never be red - failing a hook results in the run failing entirely and halting.

Hah, I was going to mention that, but wasn't sure if it was still the case. We should probably do something different there, but that can happen in a separate PR! If nothing else, this PR is a step in the right direction... users will be able to see which hook failed in the stdout now :)

core/dbt/task/run.py Show resolved Hide resolved
core/dbt/task/run.py Outdated Show resolved Hide resolved
@beckjake beckjake requested a review from drewbanin May 9, 2019 19:20
core/dbt/task/run.py Outdated Show resolved Hide resolved
beckjake and others added 2 commits May 9, 2019 14:30
Co-Authored-By: Drew Banin <drew@fishtownanalytics.com>
Order hooks in a deterministic way:
 - in the root project, hooks in index order
 - foreach dependency project in alphabetical order, hooks in index order
Since these hooks always have a valid "index", no need to devise a default
Add some tests
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a little different than the original spec, but I'm super happy with where we ended up! Thanks for your help with iterating on this -- ship it! :shipit:

@beckjake beckjake merged commit 210cf43 into dev/wilt-chamberlain May 10, 2019
@beckjake beckjake deleted the feature/output-run-hooks branch May 10, 2019 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

show on-run-start/end + hooks in dbt output
2 participants