-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial structured logging work with fire_event
#4137
Changes from all commits
8655220
96d13ab
f8adfea
b130e5b
fe66c7f
054c080
cc362be
659e1b2
259a893
f1996f3
3949f65
57ef51d
006622a
8a4c513
080d756
c353342
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Events Module | ||
|
||
The Events module is the implmentation for structured logging. These events represent both a programatic interface to dbt processes as well as human-readable messaging in one centralized place. The centralization allows for leveraging mypy to enforce interface invariants across all dbt events, and the distinct type layer allows for decoupling events and libraries such as loggers. | ||
|
||
# Using the Events Module | ||
The event module provides types that represent what is happening in dbt in `events.types`. These types are intended to represent an exhaustive list of all things happening within dbt that will need to be logged, streamed, or printed. To fire an event, `events.functions::fire_event` is the entry point to the module from everywhere in dbt. | ||
|
||
# Adding a New Event | ||
In `events.types` add a new class that represents the new event. This may be a simple class with no values, or it may be a dataclass with some values to construct downstream messaging. Only include the data necessary to construct this message within this class. You must extend all destinations (e.g. - if your log message belongs on the cli, extend `CliEventABC`) as well as the loglevel this event belongs to. |
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
|
||
import dbt.logger as logger # type: ignore # TODO eventually remove dependency on this logger | ||
from dbt.events.history import EVENT_HISTORY | ||
from dbt.events.types import CliEventABC, Event | ||
|
||
|
||
# top-level method for accessing the new eventing system | ||
# this is where all the side effects happen branched by event type | ||
# (i.e. - mutating the event history, printing to stdout, logging | ||
# to files, etc.) | ||
def fire_event(e: Event) -> None: | ||
EVENT_HISTORY.append(e) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dumb question, because I don't really understand how computers work: Do we need to flush/cycle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe! This is just a horribly naive approach to in-memory history. I figure we'll cross that bridge when we need to. There are a few tactics we could use to make this more robust, but I'm not totally sure which one to go with yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's another reason not to keep methods (i.e. - There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did some rudimentary testing here, and I really can't get methods to increase the memory footprint of objects. So I'm going to try swapping this around to the OO way of things and see if I can get a similar level of safety there too. |
||
if isinstance(e, CliEventABC): | ||
if e.level_tag() == 'test': | ||
# TODO after implmenting #3977 send to new test level | ||
logger.GLOBAL_LOGGER.debug(logger.timestamped_line(e.cli_msg())) | ||
elif e.level_tag() == 'debug': | ||
logger.GLOBAL_LOGGER.debug(logger.timestamped_line(e.cli_msg())) | ||
elif e.level_tag() == 'info': | ||
logger.GLOBAL_LOGGER.info(logger.timestamped_line(e.cli_msg())) | ||
elif e.level_tag() == 'warn': | ||
logger.GLOBAL_LOGGER.warning()(logger.timestamped_line(e.cli_msg())) | ||
elif e.level_tag() == 'error': | ||
logger.GLOBAL_LOGGER.error(logger.timestamped_line(e.cli_msg())) | ||
else: | ||
raise AssertionError( | ||
f"Event type {type(e).__name__} has unhandled level: {e.level_tag()}" | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
from dbt.events.types import Event | ||
from typing import List | ||
|
||
|
||
# the global history of events for this session | ||
# TODO this is naive and the memory footprint is likely far too large. | ||
EVENT_HISTORY: List[Event] = [] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
from abc import ABCMeta, abstractmethod | ||
from dataclasses import dataclass | ||
|
||
|
||
# types to represent log levels | ||
|
||
# in preparation for #3977 | ||
class TestLevel(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Idea for future iterations on this: Implement log level number as well since it's a defined thing in python. This would let us more easily tie in with python tools that utilize log levels (I imagine structlog has some support for it) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good to know! Yeah I imagine we can work these in once we hook structlog up. |
||
def level_tag(self) -> str: | ||
return "test" | ||
|
||
|
||
class DebugLevel(): | ||
def level_tag(self) -> str: | ||
return "debug" | ||
|
||
|
||
class InfoLevel(): | ||
def level_tag(self) -> str: | ||
return "info" | ||
|
||
|
||
class WarnLevel(): | ||
def level_tag(self) -> str: | ||
return "warn" | ||
|
||
|
||
class ErrorLevel(): | ||
def level_tag(self) -> str: | ||
return "error" | ||
|
||
|
||
# The following classes represent the data necessary to describe a | ||
# particular event to both human readable logs, and machine reliable | ||
# event streams. classes extend superclasses that indicate what | ||
# destinations they are intended for, which mypy uses to enforce | ||
# that the necessary methods are defined. | ||
|
||
|
||
# top-level superclass for all events | ||
class Event(metaclass=ABCMeta): | ||
# do not define this yourself. inherit it from one of the above level types. | ||
@abstractmethod | ||
def level_tag(self) -> str: | ||
raise Exception("level_tag not implemented for event") | ||
|
||
|
||
class CliEventABC(Event, metaclass=ABCMeta): | ||
# Solely the human readable message. Timestamps and formatting will be added by the logger. | ||
@abstractmethod | ||
def cli_msg(self) -> str: | ||
raise Exception("cli_msg not implemented for cli event") | ||
|
||
|
||
class ParsingStart(InfoLevel, CliEventABC): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor detail: can we ditch the periods at the end of these returned strings? They aren't sentences. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I copied the messages exactly the way they are printed today. I completely agree, however I think I want to do user-facing message improvements in their own PR so product can give them all a go. I might be being a bit too structured for something as silly as these periods though. |
||
def cli_msg(self) -> str: | ||
return "Start parsing." | ||
|
||
|
||
class ParsingCompiling(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "Compiling." | ||
|
||
|
||
class ParsingWritingManifest(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "Writing manifest." | ||
|
||
|
||
class ParsingDone(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "Done." | ||
|
||
|
||
class ManifestDependenciesLoaded(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "Dependencies loaded" | ||
|
||
|
||
class ManifestLoaderCreated(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "ManifestLoader created" | ||
|
||
|
||
class ManifestLoaded(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "Manifest loaded" | ||
|
||
|
||
class ManifestChecked(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "Manifest checked" | ||
|
||
|
||
class ManifestFlatGraphBuilt(InfoLevel, CliEventABC): | ||
def cli_msg(self) -> str: | ||
return "Flat graph built" | ||
|
||
|
||
@dataclass | ||
class ReportPerformancePath(InfoLevel, CliEventABC): | ||
path: str | ||
|
||
def cli_msg(self) -> str: | ||
return f"Performance info: {self.path}" | ||
|
||
|
||
# since mypy doesn't run on every file we need to suggest to mypy that every | ||
# class gets instantiated. But we don't actually want to run this code. | ||
# making the conditional `if False` causes mypy to skip it as dead code so | ||
# we need to skirt around that by computing something it doesn't check statically. | ||
# | ||
# TODO remove these lines once we run mypy everywhere. | ||
if 1 == 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have to admit, I kinda love this hack... |
||
ParsingStart() | ||
ParsingCompiling() | ||
ParsingWritingManifest() | ||
ParsingDone() | ||
ManifestDependenciesLoaded() | ||
ManifestLoaderCreated() | ||
ManifestLoaded() | ||
ManifestChecked() | ||
ManifestFlatGraphBuilt() | ||
ReportPerformancePath(path='') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to eventually replace with
structlog
, when the output is CLI/file logging, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. This is just to show that the new structure works when you run dbt without introducing too much nonsense all at once. It should be a relatively easy swap out in a future PR.