Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4354] Different output for console and file logs #4379

Merged
merged 3 commits into from
Dec 2, 2021
Merged

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Dec 1, 2021

resolves #4354

Description

Create a different format for file logs than for console logs.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change

@cla-bot cla-bot bot added the cla:yes label Dec 1, 2021
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took for a quick spin, I like the way it's looking!

Comment on lines +237 to +238
fire_event(MainReportVersion(v=str(dbt.version.installed)))
fire_event(MainReportArgs(args=parsed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky thing about this reordering: We're going to raise deprecation warnings (due to dbt_project.yml config renaming) before we fire MainReportVersion. Those are raised during task = parsed.cls.from_args(args=parsed) above:

$ dbt run
21:56:50 | [ warn  ] | * Deprecation Warning:
The `source-paths` config has been deprecated in favor of `model-paths`. Please
update your `dbt_project.yml` configuration to reflect this change.
21:56:50 | [ warn  ] | * Deprecation Warning:
The `data-paths` config has been deprecated in favor of `seed-paths`. Please
update your `dbt_project.yml` configuration to reflect this change.
21:56:50 | [ info  ] | Running with dbt=1.0.0-rc3
21:56:50 | [ info  ] | Found 2 models, 0 tests, 0 snapshots, 0 analyses, 168 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics

Maybe that's not a big deal, since hopefully users resolve the deprecations by renaming quickly. It feels like a small price to pay, if what it gets us is a fully configured event logger before we actually fire any events...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I personally kind of prefer to have the 'Running with' message right above 'Found...' message, but it's certainly a matter of opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point!

@@ -162,7 +163,7 @@ def event_to_serializable_dict(
# translates an Event to a completely formatted text-based log line
# you have to specify which message you want. (i.e. - e.message, e.cli_msg(), e.file_msg())
# type hinting everything as strings so we don't get any unintentional string conversions via str()
def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
def create_stdout_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's cut level from stdout logger, so that it's just timestamp + message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. When I did that the pipe between timestamp and message didn't seem necessary, so I made it two spaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces looks clean! I like it:

$ dbt run
12:00:00  Running with dbt=1.0.0-rc3
12:00:00  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 168 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
12:00:00
12:00:00  Concurrency: 5 threads (target='dev')
12:00:00
12:00:00  1 of 2 START table model dbt_jcohen.my_table.................................... [RUN]
12:00:00  1 of 2 OK created table model dbt_jcohen.my_table............................... [SELECT 1 in 0.06s]
12:00:00  2 of 2 START view model dbt_jcohen.my_view...................................... [RUN]
12:00:00  2 of 2 OK created view model dbt_jcohen.my_view................................. [CREATE VIEW in 0.03s]
12:00:00
12:00:00  Finished running 1 table model, 1 view model in 0.26s.
12:00:00
12:00:00  Completed successfully
12:00:00
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Previous versions of dbt used a pipe for messages that had timestamps. I think this is a good change.

@@ -171,6 +172,22 @@ def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
return log_line


def create_file_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about:

  • padding all thread names to be equal width (up to 10 characters, the length of MainThread)
  • including level in the file logger — I imagine it might be helpful to scan/search for error + warn messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I empathize with the desire to make them even. It would be slightly better to read. But there is also a thread name ThreadPoolExecutor-0_0. Should I just truncate that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I truncated ThreadPoolExecutor-0_0 to ThreadPool, and changed to the square brackets. Making the prefix consistent does help with the reading. Still not entirely happy with the padded spaces in the thread name. We could translate MainThread to 'Thread-0' and ThreadPool to 'Thread-P'. Is it possible for the thread number to go higher than 9?

Anyway, take a look and see what you think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like this more than the current version!

Switch to square brackets is good. Spacing is a teensy bit awkward, but I do think it makes it easier to read.

Is it possible for the thread number to go higher than 9?

In theory, users can set it as high as they want, but I don't think we should worry about it being 3+ digits, let alone >4

# Create a separator if this is the beginning of an invocation
if type(e) == MainReportVersion:
separator = 30 * '='
log_line = f'\n\n{separator} {e.get_ts()} | {get_invocation_id()} {separator}\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!! with the invocation_id too!! I really like this

@nathaniel-may may have thoughts about implementation. Given that MainReportVersion really truly is a one-of-a-kind event, and ought to be the very first one fired, this approach makes some sense to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah if it's truly one of a kind and only happens in this text-file combo, this approach looks like the right way to do it.

@gshank gshank requested a review from jtcohen6 December 2, 2021 00:05
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of these changes

@@ -171,6 +172,22 @@ def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
return log_line


def create_file_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like this more than the current version!

Switch to square brackets is good. Spacing is a teensy bit awkward, but I do think it makes it easier to read.

Is it possible for the thread number to go higher than 9?

In theory, users can set it as high as they want, but I don't think we should worry about it being 3+ digits, let alone >4

Comment on lines +237 to +238
fire_event(MainReportVersion(v=str(dbt.version.installed)))
fire_event(MainReportArgs(args=parsed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point!

@@ -162,7 +163,7 @@ def event_to_serializable_dict(
# translates an Event to a completely formatted text-based log line
# you have to specify which message you want. (i.e. - e.message, e.cli_msg(), e.file_msg())
# type hinting everything as strings so we don't get any unintentional string conversions via str()
def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
def create_stdout_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces looks clean! I like it:

$ dbt run
12:00:00  Running with dbt=1.0.0-rc3
12:00:00  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 168 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
12:00:00
12:00:00  Concurrency: 5 threads (target='dev')
12:00:00
12:00:00  1 of 2 START table model dbt_jcohen.my_table.................................... [RUN]
12:00:00  1 of 2 OK created table model dbt_jcohen.my_table............................... [SELECT 1 in 0.06s]
12:00:00  2 of 2 START view model dbt_jcohen.my_view...................................... [RUN]
12:00:00  2 of 2 OK created view model dbt_jcohen.my_view................................. [CREATE VIEW in 0.03s]
12:00:00
12:00:00  Finished running 1 table model, 1 view model in 0.26s.
12:00:00
12:00:00  Completed successfully
12:00:00
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Previous versions of dbt used a pipe for messages that had timestamps. I think this is a good change.

@gshank gshank merged commit c220b1e into main Dec 2, 2021
@gshank gshank deleted the log_file_tweaks branch December 2, 2021 13:23
@jtcohen6 jtcohen6 mentioned this pull request Dec 2, 2021
2 tasks
leahwicz pushed a commit that referenced this pull request Dec 2, 2021
* [#4354] Different output for console and file logs

* Tweak some log formats

* Change loging of thread names
leahwicz added a commit that referenced this pull request Dec 2, 2021
* [#4354] Different output for console and file logs

* Tweak some log formats

* Change loging of thread names

Co-authored-by: Gerda Shank <gerda@fishtownanalytics.com>
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* [#4354] Different output for console and file logs

* Tweak some log formats

* Change loging of thread names

automatic commit by git-black, original commits:
  c220b1e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text vs JSON and File vs CLI logs
3 participants