You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dbt currently produces a run_results.json file at the end of every invocation. To aid in understanding the performance characteristics of dbt projects, dbt should add some additional performance information to this file.
The performance characteristics of dbt runs can be understood in largely two parts:
initialization
resource running
Initialization
At the beginning of every dbt invocation (compile, run, test, seed, archive), dbt needs to complete the following tasks:
bootstrapping (load and parse config files, import adapters, etc)
load and parse all of the resources in a project
dbt should record timing information for both of these steps. Specifically, we care about the start and end time of each of these steps so that we can draw a gantt chart of what dbt is doing on a timeline.
Resource running
Once the project has been parsed, dbt can begin executing resources. Project execution takes the following form:
on-run-start hooks (if applicable)
for each selected resource:
a. pre-hooks (if applicable)
b. resource execution
c. post-hooks (if applicable)
on-run-end hooks
dbt should record the start/end time for each of these steps, adding them to the resources in run_results.json.
Any on-run-start and on-run-end operations should be represented in the nodes list of run_results.json
The top-level bootstrap/parse timing is mostly intended for internal use, to understand the performance characteristics of different versions of dbt
The top-level bootstrap/parse records may contain type fields that describe what type of parsing or bootstrapping is happening. Alternatively, we can just use names like parse - archive if more convenient.
Note the addition of the thread_id field, intended to help visualize parallelism in the dbt run
Use UTC for everything (sub-second level granularity)
TODO : identify how these performance characteristics are tracked using Snowplow
The text was updated successfully, but these errors were encountered:
Incorporate this data into run_results.json as well
drewbanin
changed the title
Include timing for discrete stages of dbt pipeline in anonymous event tracking
Better record keeping for resource timing
Jan 11, 2019
drewbanin
changed the title
Better record keeping for resource timing
Improve record keeping for resource timing
Jan 11, 2019
Feature
Feature description
dbt currently produces a
run_results.json
file at the end of every invocation. To aid in understanding the performance characteristics of dbt projects, dbt should add some additional performance information to this file.The performance characteristics of dbt runs can be understood in largely two parts:
Initialization
At the beginning of every dbt invocation (compile, run, test, seed, archive), dbt needs to complete the following tasks:
dbt should record timing information for both of these steps. Specifically, we care about the start and end time of each of these steps so that we can draw a gantt chart of what dbt is doing on a timeline.
Resource running
Once the project has been parsed, dbt can begin executing resources. Project execution takes the following form:
a. pre-hooks (if applicable)
b. resource execution
c. post-hooks (if applicable)
dbt should record the start/end time for each of these steps, adding them to the resources in
run_results.json
.run_results.json
Considerations
nodes
list ofrun_results.json
type
fields that describe what type of parsing or bootstrapping is happening. Alternatively, we can just use names likeparse - archive
if more convenient.thread_id
field, intended to help visualize parallelism in the dbt runTODO : identify how these performance characteristics are tracked using Snowplow
The text was updated successfully, but these errors were encountered: