Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test priority order of profiles directory configuration #5715

Merged
merged 5 commits into from
Aug 30, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions core/dbt/tests/util.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import io
import os
import shutil
import yaml
import json
import warnings
from datetime import datetime
from typing import List
from contextlib import contextmanager
from contextlib import contextmanager, redirect_stdout

from dbt.main import handle_and_check
from dbt.logger import log_manager
Expand All @@ -17,6 +18,7 @@
# Test utilities
# run_dbt
# run_dbt_and_capture
# run_dbt_and_capture_stdout
# get_manifest
# copy_file
# rm_file
Expand Down Expand Up @@ -69,7 +71,10 @@ def run_dbt(args: List[str] = None, expect_pass=True):

print("\n\nInvoking dbt with {}".format(args))
res, success = handle_and_check(args)
assert success == expect_pass, "dbt exit state did not match expected"

if expect_pass is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the reason for this change. "expect_pass" should always be True or False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expect_pass is a binary option, then the exit code of the dbt <subcommand> <options> will preempt everything else and take priority over subsequent assertions. Having an option to run_dbt that doesn't assert allows the user to take control of the assertion process independent of the exit code.

Why this matters in this case:

  • dbt debug will have a non-zero exit code in certain situations (like if a profile is not found at a particular location).

In these tests, I want to get the output of dbt debug regardless of the exit code and test the content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason that it doesn't work to do expect_pass=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

pytest recommends thinking of tests being composed of four steps:

  1. Arrange
  2. Act
  3. Assert
  4. Cleanup

Currently, the implementation of dbt_run combines both "Act" and "Assert". I want the option for it to only "Act" and leave the "Assert" for a separate and independent step. This also seems like it would align with the anatomy of a test that pytest is promoting.

Why

This pull request is a prelude to testing any implementation of #5411.

As-is, expect_pass=False might work for the ~9 different dbt_runs that end up being executed. But an implementation of #5411 will introduce another ~3 executions of dbt_runs (actually doing dbt debug) which might actually have a zero exit codes, hence those specific cases would need to be expect_pass=True.

It's greatly simplified if there is the option to separate "Assert" from "Act" within dbt_run.

Additionally, this would give more flexibility to other test implementers (while retaining the current default value of expect_pass=True).

Are there downsides or risks you see in my proposed change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really object to this change. It mostly works this way because that's the way it was in test/integration/base.py and it's just convenient to not have to do an assert for almost every dbt_run.

assert success == expect_pass, "dbt exit state did not match expected"

return res


Expand All @@ -94,6 +99,16 @@ def run_dbt_and_capture(args: List[str] = None, expect_pass=True):
return res, stdout


# Use this if you need to capture the standard out in a test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why duplicate the 'run_dbt_and_capture' functions? What does this do different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbt debug contains print() statements which weren't showing up in the output for run_dbt_and_capture.

In my hands-on experiments, run_dbt_and_capture doesn't actually capture standard out -- I think it captures content streamed to the logs.

So I created a variant that includes the output of print() statements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need two different variations of this function. Looking at the original one, it looks like it certainly intended to capture the stdout logs, if you look at the 'capture_stdout_logs' in core/dbt/events/functions.py.

If there's something that it's missing, we should fix it rather than create an almost duplicate. What do you mean by "print()" statements? We shouldn't have any actual print statements to capture, should we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used git grep "print(" ./ to find instances of print().

dbt debug uses print() statements. There are a some other places too, but I didn't examine them in detail.

Do you know how to modify capture_stdout_logs to capture print() statements? Or some other alternative to avoid functions that are near duplicates? I saw how capture_stdout_logs captures logging, but didn't see any obvious way for it to capture standard out also.

def run_dbt_and_capture_stdout(args: List[str] = None, expect_pass=True):
stringbuf = io.StringIO()
with redirect_stdout(stringbuf):
res = run_dbt(args, expect_pass=expect_pass)
stdout = stringbuf.getvalue()

return res, stdout


# Used in test cases to get the manifest from the partial parsing file
# Note: this uses an internal version of the manifest, and in the future
# parts of it will not be supported for external use.
Expand Down
159 changes: 159 additions & 0 deletions tests/functional/profiles/test_profile_dir.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
import os
import pytest
import yaml
from contextlib import contextmanager
from pathlib import Path

import dbt.flags as flags
from dbt.tests.util import run_dbt_and_capture_stdout, write_file, rm_file


@pytest.fixture(scope="class")
def profiles_yml(profiles_root, dbt_profile_data):
write_file(yaml.safe_dump(dbt_profile_data), profiles_root, "profiles.yml")
return dbt_profile_data


@pytest.fixture(scope="class")
def profiles_home_root():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested what happens if there is no profiles.yml at this location? Or the ~/.dbt directory doesn't exist? Because we can't depend on there being such a file in a test environment.

Copy link
Contributor Author

@dbeatty10 dbeatty10 Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did for profiles.yml! I didn't check how dbt debug behaves if ~/.dbt doesn't exist, but that's something I can follow-up on.

The change I made to run_dbt to allow expect_pass=None enabled the tests to run dbt debug and capture the output, even when there are no profiles.yml files.

The beauty of these tests is that they merely confirm that it is looking in the expected directories for profiles.yml -- it is irrelevant to these tests if the profiles.yml actually exist within the user-specified directory or not. Other tests should verify the correct behavior when profiles.yml does/doesn't exist.

return os.path.join(os.path.expanduser("~"), ".dbt")


@pytest.fixture(scope="class")
def profiles_env_root(tmpdir_factory):
path = tmpdir_factory.mktemp("profile_env")
# environment variables are lowercased for some reason in _get_flag_value_from_env within dbt.flags
return str(path).lower()


@pytest.fixture(scope="class")
def profiles_flag_root(tmpdir_factory):
return tmpdir_factory.mktemp("profile_flag")


@pytest.fixture(scope="class")
def profiles_project_root(project):
return project.project_root


@pytest.fixture(scope="class")
def cwd():
return os.getcwd()


@pytest.fixture(scope="class")
def cwd_parent(cwd):
return os.path.dirname(cwd)


@pytest.fixture(scope="class")
def cwd_child():
# pick any child directory of the dbt project
return Path(os.getcwd()) / "macros"


@pytest.fixture
def write_profiles_yml(request):
def _write_profiles_yml(profiles_dir, dbt_profile_contents):
def cleanup():
rm_file(Path(profiles_dir) / "profiles.yml")

request.addfinalizer(cleanup)
write_file(yaml.safe_dump(dbt_profile_contents), profiles_dir, "profiles.yml")

return _write_profiles_yml


# https://gist.github.com/igniteflow/7267431?permalink_comment_id=2551951#gistcomment-2551951
@contextmanager
def environ(env):
"""Temporarily set environment variables inside the context manager and
fully restore previous environment afterwards
"""
original_env = {key: os.getenv(key) for key in env}
os.environ.update(env)
try:
yield
finally:
for key, value in original_env.items():
if value is None:
del os.environ[key]
else:
os.environ[key] = value


class TestProfiles:
def dbt_debug(self, project_dir_cli_arg=None, profiles_dir_cli_arg=None):
# begin with no command-line args or user config (from profiles.yml)
flags.set_from_args({}, {})
command = ["debug"]

if project_dir_cli_arg:
command.extend(["--project-dir", str(project_dir_cli_arg)])

if profiles_dir_cli_arg:
command.extend(["--profiles-dir", str(profiles_dir_cli_arg)])

# get the output of `dbt debug` regarless of the exit code
return run_dbt_and_capture_stdout(command, expect_pass=None)

@pytest.mark.parametrize(
"project_dir_cli_arg, working_directory",
[
# 3 different scenarios for `--project-dir` flag and current working directory
(None, "cwd"), # no --project-dir flag and cwd is project directory
(None, "cwd_child"), # no --project-dir flag and cwd is a project subdirectory
("cwd", "cwd_parent"), # use --project-dir flag and cwd is outside of it
],
)
def test_profiles(
self,
project_dir_cli_arg,
working_directory,
write_profiles_yml,
dbt_profile_data,
profiles_home_root,
profiles_project_root,
profiles_flag_root,
profiles_env_root,
request,
):
"""Verify priority order to search for profiles.yml configuration.

Reverse priority order:
1. HOME directory
2. DBT_PROFILES_DIR environment variable
3. --profiles-dir command-line argument

Specification later in this list will take priority over earlier ones, even when both are provided.
"""

# https://pypi.org/project/pytest-lazy-fixture/ is an alternative to using request.getfixturevalue
if project_dir_cli_arg is not None:
project_dir_cli_arg = request.getfixturevalue(project_dir_cli_arg)

if working_directory is not None:
working_directory = request.getfixturevalue(working_directory)

# start in the specified directory
if working_directory is not None:
os.chdir(working_directory)

# default case with profiles.yml in the HOME directory
_, stdout = self.dbt_debug(project_dir_cli_arg)
assert f"Using profiles.yml file at {profiles_home_root}" in stdout

# set DBT_PROFILES_DIR environment variable for the remainder of the cases
env_vars = {"DBT_PROFILES_DIR": profiles_env_root}
with environ(env_vars):
_, stdout = self.dbt_debug(project_dir_cli_arg)
assert f"Using profiles.yml file at {profiles_env_root}" in stdout

# This additional case is also within the context manager because we want to verify
# that it takes priority even when the relevant environment variable is also set

# set --profiles-dir on the command-line
_, stdout = self.dbt_debug(
project_dir_cli_arg, profiles_dir_cli_arg=profiles_flag_root
)
assert f"Using profiles.yml file at {profiles_flag_root}" in stdout