Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4354] Different output for console and file logs #4379

Merged
merged 3 commits into from
Dec 2, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 32 additions & 11 deletions core/dbt/events/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from datetime import datetime
import dbt.events.functions as this # don't worry I hate it too.
from dbt.events.base_types import Cli, Event, File, ShowException, NodeInfo, Cache
from dbt.events.types import EventBufferFull, T_Event
from dbt.events.types import EventBufferFull, T_Event, MainReportVersion
import dbt.flags as flags
# TODO this will need to move eventually
from dbt.logger import SECRET_ENV_PREFIX, make_log_dir_if_missing, GLOBAL_LOGGER
Expand All @@ -17,6 +17,7 @@
from logging.handlers import RotatingFileHandler
import os
import uuid
import threading
from typing import Any, Callable, Dict, List, Optional, Union
import dataclasses
from collections import deque
Expand Down Expand Up @@ -162,12 +163,31 @@ def event_to_serializable_dict(
# translates an Event to a completely formatted text-based log line
# you have to specify which message you want. (i.e. - e.message, e.cli_msg(), e.file_msg())
# type hinting everything as strings so we don't get any unintentional string conversions via str()
def create_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
def create_stdout_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's cut level from stdout logger, so that it's just timestamp + message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. When I did that the pipe between timestamp and message didn't seem necessary, so I made it two spaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces looks clean! I like it:

$ dbt run
12:00:00  Running with dbt=1.0.0-rc3
12:00:00  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 168 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
12:00:00
12:00:00  Concurrency: 5 threads (target='dev')
12:00:00
12:00:00  1 of 2 START table model dbt_jcohen.my_table.................................... [RUN]
12:00:00  1 of 2 OK created table model dbt_jcohen.my_table............................... [SELECT 1 in 0.06s]
12:00:00  2 of 2 START view model dbt_jcohen.my_view...................................... [RUN]
12:00:00  2 of 2 OK created view model dbt_jcohen.my_view................................. [CREATE VIEW in 0.03s]
12:00:00
12:00:00  Finished running 1 table model, 1 view model in 0.26s.
12:00:00
12:00:00  Completed successfully
12:00:00
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Previous versions of dbt used a pipe for messages that had timestamps. I think this is a good change.

color_tag: str = '' if this.format_color else Style.RESET_ALL
ts: str = e.get_ts().strftime("%H:%M:%S")
scrubbed_msg: str = scrub_secrets(msg_fn(e), env_secrets())
log_line: str = f"{color_tag}{ts} {scrubbed_msg}"
return log_line


def create_file_text_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about:

  • padding all thread names to be equal width (up to 10 characters, the length of MainThread)
  • including level in the file logger — I imagine it might be helpful to scan/search for error + warn messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I empathize with the desire to make them even. It would be slightly better to read. But there is also a thread name ThreadPoolExecutor-0_0. Should I just truncate that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I truncated ThreadPoolExecutor-0_0 to ThreadPool, and changed to the square brackets. Making the prefix consistent does help with the reading. Still not entirely happy with the padded spaces in the thread name. We could translate MainThread to 'Thread-0' and ThreadPool to 'Thread-P'. Is it possible for the thread number to go higher than 9?

Anyway, take a look and see what you think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like this more than the current version!

Switch to square brackets is good. Spacing is a teensy bit awkward, but I do think it makes it easier to read.

Is it possible for the thread number to go higher than 9?

In theory, users can set it as high as they want, but I don't think we should worry about it being 3+ digits, let alone >4

log_line: str = ''
# Create a separator if this is the beginning of an invocation
if type(e) == MainReportVersion:
separator = 30 * '='
log_line = f'\n\n{separator} {e.get_ts()} | {get_invocation_id()} {separator}\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!! with the invocation_id too!! I really like this

@nathaniel-may may have thoughts about implementation. Given that MainReportVersion really truly is a one-of-a-kind event, and ought to be the very first one fired, this approach makes some sense to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah if it's truly one of a kind and only happens in this text-file combo, this approach looks like the right way to do it.

color_tag: str = '' if this.format_color else Style.RESET_ALL
ts: str = e.get_ts().strftime("%H:%M:%S.%f")
scrubbed_msg: str = scrub_secrets(msg_fn(e), env_secrets())
level: str = e.level_tag() if len(e.level_tag()) == 5 else f"{e.level_tag()} "
log_line: str = f"{color_tag}{ts} | [ {level} ] | {scrubbed_msg}"
thread = ''
if threading.current_thread().getName():
thread_name = threading.current_thread().getName()
thread_name = thread_name[:10]
thread_name = thread_name.ljust(10, ' ')
thread = f' [{thread_name}]:'
log_line = log_line + f"{color_tag}{ts} [{level}]{thread} {scrubbed_msg}"
return log_line


Expand All @@ -179,13 +199,14 @@ def create_json_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
return scrub_secrets(raw_log_line, env_secrets())


# calls create_text_log_line() or create_json_log_line() according to logger config
def create_log_line(e: T_Event, msg_fn: Callable[[T_Event], str]) -> str:
return (
create_json_log_line(e, msg_fn)
if this.format_json else
create_text_log_line(e, msg_fn)
)
# calls create_stdout_text_log_line() or create_json_log_line() according to logger config
def create_log_line(e: T_Event, msg_fn: Callable[[T_Event], str], file_output=False) -> str:
if this.format_json:
return create_json_log_line(e, msg_fn) # json output, both console and file
elif file_output is True:
return create_file_text_log_line(e, msg_fn) # default file output
else:
return create_stdout_text_log_line(e, msg_fn) # console output


# allows for resuse of this obnoxious if else tree.
Expand Down Expand Up @@ -285,7 +306,7 @@ def fire_event(e: Event) -> None:

# always logs debug level regardless of user input
if isinstance(e, File):
log_line = create_log_line(e, msg_fn=lambda x: x.file_msg())
log_line = create_log_line(e, msg_fn=lambda x: x.file_msg(), file_output=True)
# doesn't send exceptions to exception logger
send_to_logger(FILE_LOG, level_tag=e.level_tag(), log_line=log_line)

Expand Down
16 changes: 7 additions & 9 deletions core/dbt/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,24 +221,22 @@ def track_run(task):
def run_from_args(parsed):
log_cache_events(getattr(parsed, 'log_cache_events', False))

# we can now use the logger for stdout
# set log_format in the logger
# if 'list' task: set stdout to WARN instead of INFO
level_override = parsed.cls.pre_init_hook(parsed)

fire_event(MainReportVersion(v=str(dbt.version.installed)))

# this will convert DbtConfigErrors into RuntimeExceptions
# task could be any one of the task objects
task = parsed.cls.from_args(args=parsed)
fire_event(MainReportArgs(args=parsed))

# Set up logging
log_path = None
if task.config is not None:
log_path = getattr(task.config, 'log_path', None)
# we can finally set the file logger up
log_manager.set_path(log_path)
# if 'list' task: set stdout to WARN instead of INFO
level_override = parsed.cls.pre_init_hook(parsed)
setup_event_logger(log_path or 'logs', level_override)

fire_event(MainReportVersion(v=str(dbt.version.installed)))
fire_event(MainReportArgs(args=parsed))
Comment on lines +237 to +238
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky thing about this reordering: We're going to raise deprecation warnings (due to dbt_project.yml config renaming) before we fire MainReportVersion. Those are raised during task = parsed.cls.from_args(args=parsed) above:

$ dbt run
21:56:50 | [ warn  ] | * Deprecation Warning:
The `source-paths` config has been deprecated in favor of `model-paths`. Please
update your `dbt_project.yml` configuration to reflect this change.
21:56:50 | [ warn  ] | * Deprecation Warning:
The `data-paths` config has been deprecated in favor of `seed-paths`. Please
update your `dbt_project.yml` configuration to reflect this change.
21:56:50 | [ info  ] | Running with dbt=1.0.0-rc3
21:56:50 | [ info  ] | Found 2 models, 0 tests, 0 snapshots, 0 analyses, 168 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics

Maybe that's not a big deal, since hopefully users resolve the deprecations by renaming quickly. It feels like a small price to pay, if what it gets us is a fully configured event logger before we actually fire any events...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I personally kind of prefer to have the 'Running with' message right above 'Found...' message, but it's certainly a matter of opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point!


if dbt.tracking.active_user is not None: # mypy appeasement, always true
fire_event(MainTrackingUserState(dbt.tracking.active_user.state()))

Expand Down