Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test priority order of profiles directory configuration #5715

Merged
merged 5 commits into from
Aug 30, 2022

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Aug 25, 2022

resolves #5714

Description

Verifies the following reverse priority order to search for profiles.yml:

  1. HOME directory of the user (i.e. ~/.dbt/)
  2. DBT_PROFILES_DIR environment variable
  3. --profiles-dir command-line argument

Checklist

@dbeatty10 dbeatty10 added the Skip Changelog Skips GHA to check for changelog file label Aug 25, 2022
@dbeatty10 dbeatty10 requested a review from a team as a code owner August 25, 2022 21:00
@cla-bot cla-bot bot added the cla:yes label Aug 25, 2022
@ChenyuLInx
Copy link
Contributor

Looks good! You will need to setup pre-commit so that you commit are passing the code quality checks.

Here's the failed check. The command it run was pre-commit run --all-files --show-diff-on-failure

@dbeatty10
Copy link
Contributor Author

@ChenyuLInx it is passing all the automated checks now.

@@ -69,7 +71,10 @@ def run_dbt(args: List[str] = None, expect_pass=True):

print("\n\nInvoking dbt with {}".format(args))
res, success = handle_and_check(args)
assert success == expect_pass, "dbt exit state did not match expected"

if expect_pass is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the reason for this change. "expect_pass" should always be True or False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expect_pass is a binary option, then the exit code of the dbt <subcommand> <options> will preempt everything else and take priority over subsequent assertions. Having an option to run_dbt that doesn't assert allows the user to take control of the assertion process independent of the exit code.

Why this matters in this case:

  • dbt debug will have a non-zero exit code in certain situations (like if a profile is not found at a particular location).

In these tests, I want to get the output of dbt debug regardless of the exit code and test the content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason that it doesn't work to do expect_pass=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

pytest recommends thinking of tests being composed of four steps:

  1. Arrange
  2. Act
  3. Assert
  4. Cleanup

Currently, the implementation of dbt_run combines both "Act" and "Assert". I want the option for it to only "Act" and leave the "Assert" for a separate and independent step. This also seems like it would align with the anatomy of a test that pytest is promoting.

Why

This pull request is a prelude to testing any implementation of #5411.

As-is, expect_pass=False might work for the ~9 different dbt_runs that end up being executed. But an implementation of #5411 will introduce another ~3 executions of dbt_runs (actually doing dbt debug) which might actually have a zero exit codes, hence those specific cases would need to be expect_pass=True.

It's greatly simplified if there is the option to separate "Assert" from "Act" within dbt_run.

Additionally, this would give more flexibility to other test implementers (while retaining the current default value of expect_pass=True).

Are there downsides or risks you see in my proposed change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really object to this change. It mostly works this way because that's the way it was in test/integration/base.py and it's just convenient to not have to do an assert for almost every dbt_run.

@@ -94,6 +99,16 @@ def run_dbt_and_capture(args: List[str] = None, expect_pass=True):
return res, stdout


# Use this if you need to capture the standard out in a test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why duplicate the 'run_dbt_and_capture' functions? What does this do different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbt debug contains print() statements which weren't showing up in the output for run_dbt_and_capture.

In my hands-on experiments, run_dbt_and_capture doesn't actually capture standard out -- I think it captures content streamed to the logs.

So I created a variant that includes the output of print() statements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need two different variations of this function. Looking at the original one, it looks like it certainly intended to capture the stdout logs, if you look at the 'capture_stdout_logs' in core/dbt/events/functions.py.

If there's something that it's missing, we should fix it rather than create an almost duplicate. What do you mean by "print()" statements? We shouldn't have any actual print statements to capture, should we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used git grep "print(" ./ to find instances of print().

dbt debug uses print() statements. There are a some other places too, but I didn't examine them in detail.

Do you know how to modify capture_stdout_logs to capture print() statements? Or some other alternative to avoid functions that are near duplicates? I saw how capture_stdout_logs captures logging, but didn't see any obvious way for it to capture standard out also.



@pytest.fixture(scope="class")
def profiles_home_root():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested what happens if there is no profiles.yml at this location? Or the ~/.dbt directory doesn't exist? Because we can't depend on there being such a file in a test environment.

Copy link
Contributor Author

@dbeatty10 dbeatty10 Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did for profiles.yml! I didn't check how dbt debug behaves if ~/.dbt doesn't exist, but that's something I can follow-up on.

The change I made to run_dbt to allow expect_pass=None enabled the tests to run dbt debug and capture the output, even when there are no profiles.yml files.

The beauty of these tests is that they merely confirm that it is looking in the expected directories for profiles.yml -- it is irrelevant to these tests if the profiles.yml actually exist within the user-specified directory or not. Other tests should verify the correct behavior when profiles.yml does/doesn't exist.

@dbeatty10
Copy link
Contributor Author

@gshank is the conversation regarding run_dbt_and_capture_stdout the only outstanding piece for this review? Or are there other portions as well?

In regards to run_dbt_and_capture_stdout, do you see how it can be combined with run_dbt_and_capture? I didn't see any way that was obvious to me, and that's why I created a separate function.

@jtcohen6 jtcohen6 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Aug 30, 2022
@dbeatty10 dbeatty10 requested a review from gshank August 30, 2022 18:08
@gshank
Copy link
Contributor

gshank commented Aug 30, 2022

That's the only issue. I'd really rather not have a separate method in dbt.tests.util, but if you want to move it into the test case, that would be fine. Unfortunately I don't have time right now to investigate how to do it in the same method. Ideally we wouldn't have any print statements at all to deal with. Moving those print statements to logging statements might be the way to go, but I don't know if there are complications there.

@dbeatty10
Copy link
Contributor Author

@gshank Per your recommendation, moved run_dbt_and_capture_stdout into the test case.

1 similar comment
@dbeatty10
Copy link
Contributor Author

@gshank Per your recommendation, moved run_dbt_and_capture_stdout into the test case.

Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@dbeatty10 dbeatty10 merged commit 1df713f into main Aug 30, 2022
@dbeatty10 dbeatty10 deleted the dbeatty/test-profiles-dir-priority-order branch August 30, 2022 23:18
agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this pull request Sep 16, 2022
* Method for capturing standard out during testing (rather than logs)

* Allow dbt exit code assertion to be optional

* Verify priority order to search for profiles.yml configuration

* Updates after pre-commit checks

* Move `run_dbt_and_capture_stdout` into the test case
josephberni pushed a commit to Gousto/dbt-core that referenced this pull request Sep 16, 2022
* Method for capturing standard out during testing (rather than logs)

* Allow dbt exit code assertion to be optional

* Verify priority order to search for profiles.yml configuration

* Updates after pre-commit checks

* Move `run_dbt_and_capture_stdout` into the test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering Skip Changelog Skips GHA to check for changelog file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1086] [Feature] Test priority order of profiles directory configuration
5 participants