Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3137] [Bug] dbt retry: Path-type flags passed to retry should override params from previous invocation #8682

Closed
2 tasks done
jtcohen6 opened this issue Sep 20, 2023 · 3 comments · Fixed by #8803 · May be fixed by #10437
Closed
2 tasks done

[CT-3137] [Bug] dbt retry: Path-type flags passed to retry should override params from previous invocation #8682

jtcohen6 opened this issue Sep 20, 2023 · 3 comments · Fixed by #8803 · May be fixed by #10437
Assignees
Labels
backport 1.6.latest bug Something isn't working Highest Severity critical bug that must be resolved immediately Impact: Orch retry

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 20, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Previous invocation's flags (loaded from args in run_results.json) are always used, even if CLI flags are explicitly passed into the retry command:

self.previous_args = self.previous_state.results.args

# Remove these args when their default values are present, otherwise they'll raise an exception
args_to_remove = {
"show": lambda x: True,
"resource_types": lambda x: x == [],
"warn_error_options": lambda x: x == {"exclude": [], "include": []},
}
for k, v in args_to_remove.items():
if k in self.previous_args and v(self.previous_args[k]):
del self.previous_args[k]
retry_flags = Flags.from_dict(cli_command, self.previous_args)
retry_config = RuntimeConfig.from_args(args=retry_flags)

Expected Behavior

CLI flags passed explicitly into the dbt retry command should override flags read in from the previous invocation.

I could see this getting hairy when we think about the config hierarchy of flags > env vars > user configs (in profiles.yml). I think the precedence order should be:

  • CLI flags passed explicitly into this invocation (dbt retry) — plus --state which may also be passed via env var (!)
  • Flags from previous invocation (including resolution of flags > env vars > user configs)

This assumes that the env vars (save --state) + user configs ought to be the same across both invocations. That is an assumption, but it feels like a reasonable one. What feels bad about this is that we're breaking the 1:1 mapping between CLI flags (--project-dir) and env vars (DBT_PROJECT_DIR), just for retry.

Steps To Reproduce

  1. Create a dbt project in a directory named proj_location_1/ with a failing model:
$ dbt run --project-dir proj_location_1/
17:31:53  Running with dbt=1.7.0-b2
17:31:53  Registered adapter: postgres=1.7.0-b2
17:31:53  Found 1 model, 0 sources, 0 exposures, 0 metrics, 598 macros, 0 groups, 0 semantic models
17:31:53
17:31:53  Concurrency: 5 threads (target='dev')
17:31:53
17:31:53  1 of 1 START sql view model dbt_jcohen.cool_model .............................. [RUN]
17:31:53  1 of 1 ERROR creating sql view model dbt_jcohen.cool_model ..................... [ERROR in 0.06s]
17:31:54
17:31:54  Finished running 1 view model in 0 hours 0 minutes and 0.23 seconds (0.23s).
17:31:54
17:31:54  Completed with 1 error and 0 warnings:
17:31:54
17:31:54    Database Error in model cool_model (models/cool_model.sql)
  column "aslkjfasd" does not exist
  LINE 7:     select aslkjfasd
                     ^
  compiled Code at target/run/my_dbt_project/models/cool_model.sql
17:31:54
17:31:54  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
  1. Move the project location: proj_location_1proj_location_2
$ mv proj_location_1/ proj_location_2/
  1. Run dbt retry pointing to proj_location_2, but dbt retry still uses proj_location_1
$ dbt retry --project-dir proj_location_2/
17:33:26  Running with dbt=1.7.0-b2
17:33:26  Registered adapter: postgres=1.7.0-b2
17:33:26  Warning: The state and target directories are the same: 'target'. This could lead to missing changes due to overwritten state including non-idempotent retries.
17:33:26  Encountered an error:
Path 'proj_location_1/' does not exist.

Relevant log output

No response

Environment

- OS: 13.4.1
- Python: 3.10.11
- dbt: 1.7.0-b2 (main) - but also relevant on v1.6

Which database adapter are you using with dbt?

postgres

Additional Context

No response

@jtcohen6 jtcohen6 added bug Something isn't working backport 1.6.latest labels Sep 20, 2023
@github-actions github-actions bot changed the title [Bug] dbt retry: CLI flags passed explicitly should override flags/args from previous invocation [CT-3137] [Bug] dbt retry: CLI flags passed explicitly should override flags/args from previous invocation Sep 20, 2023
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Sep 20, 2023

CLI flags passed explicitly into this invocation

I am afraid that this will be especially tricky. We have access to the resolved Flags, but it will be much harder to know which were passed in explicitly as CLI flags, versus set as env vars / user config in profiles.yml. (This is available in the click Context via get_parameter_source, but not in our resolved Flags object.)

If we just use the retry command's fully resolved flags to overwrite the previous invocation's fully resolved flags, we will miss any CLI flag overrides that took precedence over default values for the previous command.


--project-dir also seems particularly tricky, because it's used as the input to set other params, such as --log-path. In this case, I think the user just needs to set both the --project-dir and --log-path.


Update: I'm not going to pretend that I like any part of this, but it does appear to kinda sorta get the job done here:

        self.previous_args = self.previous_state.results.args

        # massive hack: allow 'dbt retry --explicit --cli-flags' to override previous flags
        # I do not know if/how this could also support **kwargs passed into dbtRunner.invoke
        new_cli_args = {
            k: v
            for k, v in self.args.__dict__.items()
            if f"--{k.replace('_','-')}" in self.args.invocation_command.split(" ")
        }
        for key, value in new_cli_args.items():
            self.previous_args[key] = value
            
        self.previous_command_name = self.previous_args.get("which")
$ dbt retry --project-dir proj_location_2/
20:22:21  Running with dbt=1.7.0-b2
20:22:21  Registered adapter: postgres=1.7.0-b2
20:22:21  Warning: The state and target directories are the same: 'target'. This could lead to missing changes due to overwritten state including non-idempotent retries.
20:22:21  Found 1 model, 0 sources, 0 exposures, 0 metrics, 598 macros, 0 groups, 0 semantic models
... works ...

@jtcohen6
Copy link
Contributor Author

@peter-bertuglia made a great point. In the event that we're retrying an invocation which had previously enabled deferral, the subsequent invocation will want:

  • the ability to point --defer-state to different file paths in the subsequent (retry) invocation, versus the file paths used for deferral in the previous invocation
  • the ability to set --defer-state via environment variable (DBT_DEFER_STATE) — but only --defer-state, every other flag would need to be passed explicitly

As above, plus:

        # massive hack: override previous args with new CLI args
        ...
        
        # also allow overriding DEFER_STATE via env var (but only this)
        # however: would this cause issues if the command being retried *doesn't* support --defer-state?
        # right now that is only true of 'run-operation'
        if self.args.defer_state:
            self.previous_args["defer_state"] = defer_state

@jtcohen6
Copy link
Contributor Author

After discussing further with @peterallenwebb: These "overrides" are really just needed for "path" configs specific to the file system, because the subsequent retry invocation may be running in a different file system from the original invocation being retried.

So rather than the hackery I had above ("only override configs that are present in the invocation_command"), we should prefer the current invocation's inputs for parameters which are of type Path. This includes project_dir and defer_state.

@peterallenwebb peterallenwebb self-assigned this Sep 21, 2023
@jtcohen6 jtcohen6 changed the title [CT-3137] [Bug] dbt retry: CLI flags passed explicitly should override flags/args from previous invocation [CT-3137] [Bug] dbt retry: Path-type flags passed to retry should override params from previous invocation Sep 21, 2023
@jtcohen6 jtcohen6 added the High Severity bug with significant impact that should be resolved in a reasonable timeframe label Sep 26, 2023
@martynydbt martynydbt added Highest Severity critical bug that must be resolved immediately Impact: Orch and removed High Severity bug with significant impact that should be resolved in a reasonable timeframe labels Sep 27, 2023
@aranke aranke assigned aranke and unassigned QMalcolm Oct 10, 2023
github-actions bot pushed a commit that referenced this issue Oct 10, 2023
aranke added a commit that referenced this issue Oct 11, 2023
…y` (#8811)

(cherry picked from commit 3f7f7de)

Co-authored-by: Kshitij Aranke <kshitij.aranke@dbtlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.6.latest bug Something isn't working Highest Severity critical bug that must be resolved immediately Impact: Orch retry
Projects
None yet
6 participants