Initial structured logging work with `fire_event` #4137

nathaniel-may · 2021-10-26T18:00:15Z

Description

This PR into the feature branch adds the first real bit of structured logging. The description of the module layout is in the README file. This is the best PR to ask for large structural changes to the general approach if there are any concerns with this way of doing things.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change

jtcohen6

I really appreciate the readme and annotations. I'd be curious to hear if others have structural comments or suggestions. For my part, I think I have a decent sense of how we plan to use this, and how we can extend it beyond stdout/file logging.

Am I right to think that our next steps are:

going through all the places in dbt where logger is called
adding typed events to events/types
replacing those logger calls with fire_event for the new typed event

At the same time, we need to replicate the functionality currently available in logger (log levels, JSON formatting, secret scrubbing, etc) with our new event-based system.

I'm especially interested in finding ways to safely split up and parallelize all this work, knowing that we've got precious time left ahead of planned release for v1.0.0-rc1

core/dbt/events/types.py

core/dbt/events/functions.py

jtcohen6 · 2021-10-27T11:23:31Z

core/dbt/events/functions.py

+# (i.e. - mutating the event history, printing to stdout, logging
+# to files, etc.)
+def fire_event(e: Event) -> None:
+    EVENT_HISTORY.append(e)


Dumb question, because I don't really understand how computers work: Do we need to flush/cycle EVENT_HISTORY at some point? A dbt invocations can have hundreds of thousands of debug-level log-lines, is there a risk that this will grow to use a substantial amount of memory?

Maybe! This is just a horribly naive approach to in-memory history. I figure we'll cross that bridge when we need to. There are a few tactics we could use to make this more robust, but I'm not totally sure which one to go with yet.

It's another reason not to keep methods (i.e. - cli_msg) on these datatypes. I'm pretty sure methods make the memory footprint bigger for objects in Python land. (Would need to double check this)

Did some rudimentary testing here, and I really can't get methods to increase the memory footprint of objects. So I'm going to try swapping this around to the OO way of things and see if I can get a similar level of safety there too.

jtcohen6 · 2021-10-27T11:24:30Z

core/dbt/events/functions.py

@@ -0,0 +1,61 @@
+
+import dbt.logger as logger  # type: ignore # TODO eventually remove dependency on this logger


The goal is to eventually replace with structlog, when the output is CLI/file logging, right?

That's correct. This is just to show that the new structure works when you run dbt without introducing too much nonsense all at once. It should be a relatively easy swap out in a future PR.

nathaniel-may · 2021-10-27T13:21:05Z

@jtcohen6

Am I right to think that our next steps are:

going through all the places in dbt where logger is called

adding typed events to events/types

replacing those logger calls with fire_event for the new typed event

At the same time, we need to replicate the functionality currently available in logger (log levels, JSON formatting, secret scrubbing, etc) with our new event-based system.

That's exactly correct.

nathaniel-may · 2021-10-27T15:23:02Z

Did some exploration on whether we should take the fp or oo approach to computing messages from event types. The goal here is to know at development time whether we have log messages defined for each event type or not without the diligence to write a test every time we add a log statement. The scenarios we want to avoid are: a release where some messages are not written, or a release that raises exceptions when an obscure log line is triggered.

The FP Way

The way this PR presents a solution to this is to use Union types and type-level branching (i.e. isinstance which is both runtime and type-level) to guarantee that every branch of the union is caught. assert_never is the trick that forces mypy to fail when we miss a case when we're writing a function from SomeUnionType -> SomeOtherType. This is guaranteed to work every time, however these functions will absolutely balloon into huge switch-like statements which is a little ugly and potentially difficult to tell what you're missing because mypy isn't advanced enough to name the branch you're missing, only tell you that you have one missing at all with the cryptic message "assert_never" has incompatible type.

The OO Way

We could instead use an abstract base class (ABC) to define the methods we wish to inherit with code like this:

class CliEventABC(metaclass=ABCMeta):
    @abstractmethod
    def cli_msg(self) -> str:
        raise Exception("cli_msg not implemented for event")

@dataclass(frozen=True)
class ParsingStart():
    pass

CliEventABC.register(ParsingStart)

However, this would require we drop support for v 3.6 because custom NamedTuple classes disallow also inheriting from a metaclass. Unfortunately, I can't seem to get mypy to fail in the event we don't define cli_msg here. It also doesn't fail when the file is run either. If someone who knows OO Python a little better can show me how to do this, we could consider the OO option which would definitely make the code a little neater and simpler to maintain.

Additionally, I did some tinkering to figure out if adding methods to these classes increases their memory footprint (see this thread with @jtcohen6 about in-memory event history), but I keep coming up with local results that show that they don't. I don't know enough about Python's runtime model to know how this is possible, but I can't seem to blow out the memory footprint with methods alone.

leahwicz · 2021-10-27T17:21:55Z

core/dbt/events/types.py

+]
+
+# top-level event type for all events that go to the CLI
+CliEvent = Union[


When do you see CliEvent and Event differing?

I expect there to be different destinations so Event would encompass all events, but CliEvent is only the events that go to the cli as opposed to an event stream, log file, or any other destination we want. This way we can keep the cli super crisp, while putting more details in the log file even at the same log level. That's because each event can have more than one destination, and would just have one message computed for each destination.

core/dbt/events/functions.py

core/dbt/events/types.py

nathaniel-may · 2021-10-27T20:46:22Z

I just pushed the change to OO-style python. After pairing with @iknox-fa, it's very likely that in order to match the correctness guarantees of the FP-style code we will have to manually maintain a list of dummy class instantiations until mypy is run on every file. This is because mypy does not check that the concrete classes implement abstract methods unless the class itself is instantiated. If it's not, mypy considers the concrete class definition dead code and does not check it.

That being said, this code is way nicer so I think we should go with this option since we plan on turning mypy on everywhere ala #3203 and #4089

nathaniel-may · 2021-10-28T15:50:55Z

So the tests pass even though I'm using dataclasses from python 3.7. I would expect 3.6 tests to fail, but they don't.

Because I'm planning on making many many event types, I would really like to use dataclasses so we don't have to live in boilerplate city, but if we really need to I can deal with the bloat. Thoughts? (especially @jtcohen6 re: dropping support for 3.6 for ease of development here)

nathaniel-may · 2021-10-28T16:06:55Z

I just learned we can use dataclasses with 3.6 because dbt-core installs a backport module when a user installs dbt-core witih python3.6. Thanks, @emmyoop for pointing out @kwigley's wisdom! (sorry for the false alarm @jtcohen6)

core/dbt/events/README.md

iknox-fa · 2021-10-28T22:06:56Z

core/dbt/events/types.py

+# types to represent log levels
+
+# in preparation for #3977
+class TestLevel():


Idea for future iterations on this: Implement log level number as well since it's a defined thing in python. This would let us more easily tie in with python tools that utilize log levels (I imagine structlog has some support for it)

Good to know! Yeah I imagine we can work these in once we hook structlog up.

iknox-fa · 2021-10-28T22:07:57Z

core/dbt/events/types.py

+        raise Exception("cli_msg not implemented for cli event")
+
+
+class ParsingStart(InfoLevel, CliEventABC):


Minor detail: can we ditch the periods at the end of these returned strings? They aren't sentences.

I copied the messages exactly the way they are printed today. I completely agree, however I think I want to do user-facing message improvements in their own PR so product can give them all a go. I might be being a bit too structured for something as silly as these periods though.

iknox-fa · 2021-10-28T22:09:17Z

core/dbt/events/types.py

+# we need to skirt around that by computing something it doesn't check statically.
+#
+# TODO remove these lines once we run mypy everywhere.
+if 1 == 0:


I have to admit, I kinda love this hack...

iknox-fa

LGTM, one minor thing that's not going to bother anyone but me. :)

add event type modeling and fire_event calls

add event type modeling and fire_event calls automatic commit by git-black, original commits: f9ef9da

cla-bot bot added the cla:yes label Oct 26, 2021

nathaniel-may force-pushed the first-pass-structured-logging branch from 13c8c41 to 8f1a2d4 Compare October 26, 2021 18:03

add event type modeling and fire_event calls

8655220

nathaniel-may force-pushed the first-pass-structured-logging branch from 8f1a2d4 to 8655220 Compare October 26, 2021 18:05

nathaniel-may requested review from gshank, iknox-fa, jtcohen6 and leahwicz October 26, 2021 18:06

jtcohen6 reviewed Oct 27, 2021

View reviewed changes

Nathaniel May added 2 commits October 27, 2021 09:36

update flake8 command with per-file-ignore

96d13ab

flake8 fixes

f8adfea

leahwicz reviewed Oct 27, 2021

View reviewed changes

core/dbt/events/functions.py Outdated Show resolved Hide resolved

gshank reviewed Oct 27, 2021

View reviewed changes

core/dbt/events/functions.py Outdated Show resolved Hide resolved

gshank reviewed Oct 27, 2021

View reviewed changes

core/dbt/events/types.py Outdated Show resolved Hide resolved

Nathaniel May added 2 commits October 27, 2021 16:41

OO style

b130e5b

revert flake8 command change

fe66c7f

nathaniel-may requested review from gshank, jtcohen6 and leahwicz October 27, 2021 20:56

add level distinctions

054c080

nathaniel-may requested review from emmyoop and kwigley October 27, 2021 21:29

Nathaniel May added 4 commits October 27, 2021 17:32

flake8 fixes

cc362be

move comment block

659e1b2

remove subtrees from hierarchy, put level in type.

259a893

use level tags instead of isinstance checks

f1996f3

Nathaniel May added 5 commits October 28, 2021 11:14

remove unused code

3949f65

fix silly bug

57ef51d

add one more event that actually has data

006622a

flake8 fixes

8a4c513

add dummy instance

080d756

emmyoop approved these changes Oct 28, 2021

View reviewed changes

core/dbt/events/README.md Outdated Show resolved Hide resolved

nathaniel-may mentioned this pull request Oct 28, 2021

Adapter logging interface #4161

Closed

4 tasks

update module readme

c353342

nathaniel-may mentioned this pull request Oct 28, 2021

Add Structured Logging #4055

Merged

21 tasks

iknox-fa reviewed Oct 28, 2021

View reviewed changes

iknox-fa approved these changes Oct 28, 2021

View reviewed changes

nathaniel-may merged commit 1015b89 into feature/structured-logging Oct 29, 2021

nathaniel-may deleted the first-pass-structured-logging branch October 29, 2021 13:16

emmyoop pushed a commit that referenced this pull request Nov 8, 2021

Initial structured logging work with fire_event (#4137)

8b53ca7

add event type modeling and fire_event calls

emmyoop pushed a commit that referenced this pull request Nov 8, 2021

Initial structured logging work with fire_event (#4137)

19124db

add event type modeling and fire_event calls

kwigley pushed a commit that referenced this pull request Nov 9, 2021

Initial structured logging work with fire_event (#4137)

1a994eb

add event type modeling and fire_event calls

nathaniel-may pushed a commit that referenced this pull request Nov 9, 2021

Initial structured logging work with fire_event (#4137)

f9ef9da

add event type modeling and fire_event calls

iknox-fa pushed a commit that referenced this pull request Feb 8, 2022

Initial structured logging work with fire_event (#4137)

d1c196c

add event type modeling and fire_event calls automatic commit by git-black, original commits: f9ef9da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial structured logging work with `fire_event` #4137

Initial structured logging work with `fire_event` #4137

nathaniel-may commented Oct 26, 2021 •

edited

Loading

jtcohen6 left a comment

jtcohen6 Oct 27, 2021

nathaniel-may Oct 27, 2021

nathaniel-may Oct 27, 2021

nathaniel-may Oct 27, 2021

jtcohen6 Oct 27, 2021

nathaniel-may Oct 27, 2021 •

edited

Loading

nathaniel-may commented Oct 27, 2021 •

edited

Loading

nathaniel-may commented Oct 27, 2021 •

edited

Loading

leahwicz Oct 27, 2021

nathaniel-may Oct 27, 2021 •

edited

Loading

nathaniel-may commented Oct 27, 2021

nathaniel-may commented Oct 28, 2021

nathaniel-may commented Oct 28, 2021

iknox-fa Oct 28, 2021

nathaniel-may Oct 29, 2021

iknox-fa Oct 28, 2021

nathaniel-may Oct 29, 2021

iknox-fa Oct 28, 2021

iknox-fa left a comment

		@@ -0,0 +1,61 @@

		import dbt.logger as logger # type: ignore # TODO eventually remove dependency on this logger

		raise Exception("cli_msg not implemented for cli event")


		class ParsingStart(InfoLevel, CliEventABC):

Initial structured logging work with fire_event #4137

Initial structured logging work with fire_event #4137

Conversation

nathaniel-may commented Oct 26, 2021 • edited Loading

Description

Checklist

jtcohen6 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathaniel-may Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

nathaniel-may commented Oct 27, 2021 • edited Loading

nathaniel-may commented Oct 27, 2021 • edited Loading

The FP Way

The OO Way

Choose a reason for hiding this comment

nathaniel-may Oct 27, 2021 • edited Loading

Choose a reason for hiding this comment

nathaniel-may commented Oct 27, 2021

nathaniel-may commented Oct 28, 2021

nathaniel-may commented Oct 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iknox-fa left a comment

Choose a reason for hiding this comment

Initial structured logging work with `fire_event` #4137

Initial structured logging work with `fire_event` #4137

nathaniel-may commented Oct 26, 2021 •

edited

Loading

nathaniel-may Oct 27, 2021 •

edited

Loading

nathaniel-may commented Oct 27, 2021 •

edited

Loading

nathaniel-may commented Oct 27, 2021 •

edited

Loading

nathaniel-may Oct 27, 2021 •

edited

Loading