Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactional json logging #1806

Merged
merged 4 commits into from
Oct 10, 2019

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Oct 4, 2019

Fixes #1799

This is based on #1801, so once that is merged I'll rebase this against dev/louisa-may-alcott.

@beckjake beckjake requested a review from drewbanin October 4, 2019 19:57
@cla-bot cla-bot bot added the cla:yes label Oct 4, 2019
@beckjake beckjake force-pushed the feature/add-docs-generate-rpc branch from 02feb69 to 2031e23 Compare October 7, 2019 14:16
@beckjake beckjake changed the base branch from feature/add-docs-generate-rpc to dev/louisa-may-alcott October 7, 2019 15:56
@beckjake beckjake force-pushed the feature/transactional-json-logging branch from 4b227a6 to 52ac98e Compare October 7, 2019 15:58
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broadly looks very good to me. I think there are a couple more log lines we'll want to add here:

  • at the beginning of the run, publish a log line that reports the number of resources that are going to be run in the invocation (eg. `"node_count": 17). This will help track completion percentage as the rest of the logs flow
  • in the Began running model log line, do we have a node index? That will help order the nodes in a list (i imagine doing this by timestamp could be tricky, but doable if necessary)

And a couple of questions:

  • how does the exc_info field get populated?
  • can we better structure the field values around success/error states? It looks like model_state can either be passed, or a literal error message. If possible, it would be good to present a field which has a value in {success, error, skipped} that can be used in conjunction with a free-text field which describes the error (if one occurs).

Let me know if any of these things are more involved to implement than they appear at first blush. We have a lot of options here!

@beckjake
Copy link
Contributor Author

beckjake commented Oct 7, 2019

how does the exc_info field get populated?

If you log a message with exc_info=True in an except block (and maybe in a finally block if there's an active exception), logging and logbook both insert that.

can we better structure the field values around success/error states? It looks like model_state can either be passed, or a literal error message. If possible, it would be good to present a field which has a value in {success, error, skipped} that can be used in conjunction with a free-text field which describes the error (if one occurs).

Yeah, we can do this, I think. Part of the problem is that dbt's error, status, etc fields in results are a bit ad-hoc. I'm probably being generous with that. We should probably fix the underlying data structures instead of trying to fix it on the logger side, which is a bit involved.

Add node count/total node to log lines
Make model status structured
@beckjake
Copy link
Contributor Author

beckjake commented Oct 8, 2019

I'm going to resolve that error/status issue by just calling str on things that might not be strings, instead of cleaning up the underlying behavior - we can squash any issues with weird outputs as we find them.

@beckjake beckjake requested a review from drewbanin October 8, 2019 12:50
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more cosmetic updates to make here, but this LGTM when those are addressed.

We will almost certainly want to add more detailed information here over time, but we can revisit some additional fields/tagging in the scope of a future issue.

Comment on lines 74 to 83
if result.error:
return {'model_status': 'error', 'model_error': str(result.error)}
elif result.skip:
return {'model_status': 'skipped'}
elif result.fail:
return {'model_status': 'failed'}
elif result.warn:
return {'model_status': 'warn'}
else:
return {'model_status': 'passed'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change all of these from model_status to node_status for consistency? Many of these results will be generated from nodes which are not models.

@@ -436,6 +448,12 @@ def on_skip(self):
'Freshness: nodes cannot be skipped!'
)

def get_result_status(self, result) -> Dict[str, str]:
if result.error:
return {'model_status': 'error', 'model_error': str(result.error)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also make this node_error instead of model_error

('schema', 'node_schema'),
('database', 'node_database'),
('name', 'node_name'),
('original_file_path', 'node_path')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also include the resource type here?

s/model/node/
include resource_type
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@beckjake beckjake merged commit 9819c3a into dev/louisa-may-alcott Oct 10, 2019
@beckjake beckjake deleted the feature/transactional-json-logging branch October 10, 2019 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add transactional json logging for dbt invocations
2 participants