Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/3723 greedy flag test selection #3833

Merged

Conversation

joellabes
Copy link
Contributor

resolves #3723

Description

This restores the old behaviour for magic test expansion on an opt-in basis via a new dbt test --greedy flag. It also makes the new behaviour of only testing nodes where all parents are in scope more visible, by logging the tests that were excluded.

Conversation topics

  • I've taken huge guesses at how this should be implemented, based on --store-failures. I don't really understand what I did in the flags zone, so let me know if I'm off base.
  • Should this also apply to dbt ls? I think yes - in my head, ls is an interactive debugger before applying the selection logic and they should be equivalent. If so, should it only be allowed in conjunction with --resource-type test?
  • Should this apply to --selector? I think not, you'd pass --greedy in separately?
  • Should there be a way to suppress the warnings if you know what you're doing? I'm not wed to this, but I can imagine them getting tedious, especially if it warns about a big list of nodes. I don't know where you'd put it though - passing a --i-know-about-greedy-leave-me-alone flag every time is equally tedious. Maybe an environment variable a la DBT_PROFILES_DIR?
  • There aren't any tests yet - please point me in the right direction for that

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.
  • I have updated the documentation site

@cla-bot cla-bot bot added the cla:yes label Aug 30, 2021
core/dbt/graph/selector.py Outdated Show resolved Hide resolved
core/dbt/main.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick comments

  • Really nice work on this one! You've managed to do this quite surgically.
  • I agree that dbt ls should also support the --greedy flag (as should dbt build, dbt build: Parity for flags #3596). However, dbt ls does something kinda funny & kinda good to swallow info-level logging, so that it only outputs resource types. The Some resources were excluded message will still appear, it will just be written to logs/dbt.log instead of stdout.
  • As far as testing this: I added a whole new suite, test/integration/066_test_selection_tests, just for the change in Be less greedy in test selection expansion #3235. So I think you can add a new test case there, passing the greedy flag and saying "jk we want them all"!

longer-form thoughts

It might get somewhat tiresome to keep seeing the info message. The canonical way to do this would be by supporting both --greedy and --no-greedy flags. If the user passes --no-greedy, that reinforces the default behavior and turns off the info logging. Ultimately, a bit of extra logging doesn't feel like a huge deal, and it's much better to show the message than show nothing and confuse lots of people (i.e. what we're doing today). Unless this speaks to you in a big way, I'd be happy to call that out of scope for the current PR, and welcome a new good first issue + contribution when another community member feels strongly :)

Another thought I had: More than passing an on/off flag, it feels like the most-intentional way to "lock in" any type of resource selection should be a yaml selector. Believe it or not, based on the way we've implemented this, it's already possible to set greediness within selectors, and at a finer level of granularity:

selectors:
  - name: all_for_some_not_for_all
    definition:
      union:
        - method: fqn
          value: model_a
          greedy: false
        - method: fqn
          value: model_b
          greedy: true

There's one bit of extra logic we'd need, inside selection_criteria_from_dict:

    @classmethod
    def selection_criteria_from_dict(
        cls, raw: Any, dct: Dict[str, Any], greedy: bool = False
    ) -> 'SelectionCriteria':
        if 'value' not in dct:
            raise RuntimeException(
                f'Invalid node spec "{raw}" - no search value!'
            )
        method_name, method_arguments = cls.parse_method(dct)
        
        # is 'greedy' defined in the yaml selector definition?
        # if so, use it
        if 'greedy' in dct:
            is_greedy = dct['greedy']
        
        # the 'greedy' arg to this method is False by default
        # we could edit cli.py to pass flags.GREEDY into that arg, if we want
        # the --greedy flag to interact with yaml selectors, either by
        # taking precedence or changing the default/fallback value
        else:
            is_greedy = greedy

        parents_depth = _match_to_int(dct, 'parents_depth')
        children_depth = _match_to_int(dct, 'children_depth')
        return cls(
            raw=raw,
            method=method_name,
            method_arguments=method_arguments,
            value=dct['value'],
            childrens_parents=bool(dct.get('childrens_parents')),
            parents=bool(dct.get('parents')),
            parents_depth=parents_depth,
            children=bool(dct.get('children')),
            children_depth=children_depth,
            greedy=is_greedy
        )

Questions:

  • Should yaml selectors interact with the --greedy flag at all, or just always fall back to the default of False?
  • If they do respect the greedy flag, this logic would still prioritize any greedy values set inside the yaml selector, in version control, over the value set by the --greedy flag. Does that seem like the right precedence order? I'm thinking about this like --full-refresh + --store-failures, which change the global default but should be preceded by any resource-specific settings (full_refresh + store_failures configs).
  • If they don't respect the greedy flag at all, and/or if they define greediness for all criteria, could we find a way to turn off the info logging about greediness if the user has specified --selector? By using a yaml selector, the user is in in "full control" mode. This might require a bit of tricky plumbing, for an effect that's ultimately more cosmetic than functional, but it feels promising as an abstraction...

core/dbt/graph/selector.py Outdated Show resolved Hide resolved
core/dbt/main.py Outdated Show resolved Hide resolved
@@ -66,7 +67,7 @@ def parse_union_from_default(
def parse_difference(
include: Optional[List[str]], exclude: Optional[List[str]]
) -> SelectionDifference:
included = parse_union_from_default(include, DEFAULT_INCLUDES)
included = parse_union_from_default(include, DEFAULT_INCLUDES, greedy=flags.GREEDY)
Copy link
Contributor

@jtcohen6 jtcohen6 Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This captures all standard selection (--select, --exclude) perfectly.

To make the --greedy flag work with a yaml --selector too, we'd need to add a similar bit of logic that passes flags.GREEDY into each method called by parse_from_definition. That's fairly straightforward for the first and last conditions, which already accept greedy as an argument:

  • SelectionCriteria.from_single_spec(definition, greedy=flags.GREEDY)
  • SelectionCriteria.selection_criteria_from_dict(definition, dct, greedy=flags.GREEDY) (called by parse_dict_definition) already accept greedy as an argument

We'd need to add greedy arg support for the methods parse_union_definition and parse_intersection_definition, which are called by the second and third conditions.

So: I'm thinking more about yaml selectors, and how they generally don't (and perhaps never should!) intersect with additional CLI-provided selection criteria. Also, there's the fact that they can define greedy for themselves (!)... I'll say more in my overall review comment.

@joellabes
Copy link
Contributor Author

Should yaml selectors interact with the --greedy flag at all?

I don't use them so I'm happy to follow your lead. But I'm a sucker for precedent, so if none of the other CLI-provided selection criteria are respected then I lean no. Especially if I add support for greedy config inside the selectors instead.

Agreed that selectors trump flags if we did add support though.

could we find a way to turn off the info logging about greediness if the user has specified --selector?

I'll have a look at whether I can do something like --selector value is not None around that logging area, unless there's a cleaner way you can think of.

@joellabes
Copy link
Contributor Author

OK I think I have ls going. I've added a couple of tests as starters and I know they're failing, but am struggling to interpret the output so if you can give me some pointers that'd be great 🙏

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick fixes for the tests, and a few more comments :)

core/dbt/graph/selector.py Outdated Show resolved Hide resolved
filtered_nodes = self.filter_selection(selected_nodes)

if indirect_only:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a neat move to wrap this logging logic in another method, defined higher up, and then call it down here, similar to alert_non_existence.

I also bring that up because I think the expect_exists property (attached to BaseSelectionGroup) could be a neat example for what we want to do around a raise_greedy_warning, which would raise a warning for CLI-provided selection criteria, but not for yaml selector-provided selection criteria that define greedy: true|false explicitly. The alert_non_existence method is called if spec.expect_exists here.

@joellabes
Copy link
Contributor Author

joellabes commented Sep 9, 2021

@jtcohen6 I've just had a stab at collating all of the discussion topics here, to make sure I've understood everything

Still to do:

Not to do:
Make greedy flag available in conjunction with --selector
❌ Add a --no-greedy flag to suppress the (now slimmer) warnings

Does that sound right to you?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Sep 9, 2021

@joellabes Nice! That looks right to me. Thanks for keeping track of it all.

At this point, I think the current PR is just about merge-able as is. If you can get all the code checks + unit tests passing, once you're happy with the integration test coverage for the --greedy behavior, I'd love to get this in (in its current form) for v0.21.0-rc1.

Then, we can open and scope a separate issue to support greediness + yaml selectors. That will require some plumbing, but I do think the needed changes will be fairly specific + self-contained:

  • Functional: Ensure greedy is supported by each conditional branch of parse_from_definition
  • Cosmetic: Add a new BaseSelectionGroup property, raise_greedy_warning. Ensure that selectors set raise_greedy_warning: False. Turn off logging if it's False (i.e. don't call alert_unused_nodes).

Also, I'd be okay with that functionality coming in a patch release or future version (v1.0).

Does that sound ok to you?

@cla-bot
Copy link

cla-bot bot commented Sep 17, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot removed the cla:yes label Sep 17, 2021
@joellabes
Copy link
Contributor Author

@jtcohen6 that split sounds great. Just to give you an update, I've been out of action this last week because it was my last few days at EP. Now I'm about to head off on holiday for a few days with no computer.

I'm totally happy to close off these loose ends when I come back, but if you want to get the current work into .21 then I also won't be offended if you jump in.

@cla-bot
Copy link

cla-bot bot commented Sep 18, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@jtcohen6 jtcohen6 force-pushed the feature/3723-greedy-flag-test-selection branch from 1936349 to abc1ae5 Compare September 18, 2021 16:02
@cla-bot
Copy link

cla-bot bot commented Sep 18, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

2 similar comments
@cla-bot
Copy link

cla-bot bot commented Sep 18, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Sep 18, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@jtcohen6
Copy link
Contributor

@joellabes I think I resolved the outstanding TODOs, including support for the greedy property in yaml selectors. I did not add a --no-greedy flag. If someone comes along who feels strongly, we can open a good first issue for it.

Congrats on wrapping up @ EP, and hope you enjoy your holiday! The CLA bot is having a tough time with the fact that you changed your email (I think?). I've manually checked to confirm that, yes indeed, you've signed our CLA :)

I think this PR is just about good to go. I hope to get it in for v0.21.0-rc1.

@jtcohen6
Copy link
Contributor

Aiii, looks like I need the CLA to pass in order to merge.

@joellabes Apologies in advance: I'm going to rebase this PR, and scrub your now-defunct email from the commits. In no way do I mean to take credit for the hard work you've put in here.

Add --greedy flag to subparser

Add greedy flag, override default behaviour when parsing union

Add greedy support to ls (I think?!)

That was suspiciously easy

Fix flake issues

Try adding tests with greedy support

Remove trailing whitespace

Fix return type for select_nodes

forgot to add greedy arg

Remove incorrectly expected test

Use named param for greedy

Pull alert_unused_nodes out into its own function

rename resources -> tests

Add expand_selection explanation of --greedy flag

Add greedy support to yaml selectors. Update warning, tests

Fix failing tests

Fix tests. Changelog

Fix flake8
@jtcohen6 jtcohen6 force-pushed the feature/3723-greedy-flag-test-selection branch from 126ed56 to e276d29 Compare September 18, 2021 17:41
@jtcohen6 jtcohen6 merged commit 6563d09 into dbt-labs:develop Sep 18, 2021
jtcohen6 added a commit that referenced this pull request Sep 18, 2021
Add --greedy flag to subparser

Add greedy flag, override default behaviour when parsing union

Add greedy support to ls (I think?!)

That was suspiciously easy

Fix flake issues

Try adding tests with greedy support

Remove trailing whitespace

Fix return type for select_nodes

forgot to add greedy arg

Remove incorrectly expected test

Use named param for greedy

Pull alert_unused_nodes out into its own function

rename resources -> tests

Add expand_selection explanation of --greedy flag

Add greedy support to yaml selectors. Update warning, tests

Fix failing tests

Fix tests. Changelog

Fix flake8

Co-authored-by: Joel Labes c/o Jeremy Cohen <jeremy@dbtlabs.com>
@joellabes
Copy link
Contributor Author

Ahhhh sorry! That final commit was actually some week-old unstaged work I found while cleaning out my laptop, but I'd already disassociated my work email from the account 😬 I'll fix that up when I'm back. Thanks for getting this over the line! (Especially since it was all the cleanup of my messes)

iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
Add --greedy flag to subparser

Add greedy flag, override default behaviour when parsing union

Add greedy support to ls (I think?!)

That was suspiciously easy

Fix flake issues

Try adding tests with greedy support

Remove trailing whitespace

Fix return type for select_nodes

forgot to add greedy arg

Remove incorrectly expected test

Use named param for greedy

Pull alert_unused_nodes out into its own function

rename resources -> tests

Add expand_selection explanation of --greedy flag

Add greedy support to yaml selectors. Update warning, tests

Fix failing tests

Fix tests. Changelog

Fix flake8

Co-authored-by: Joel Labes c/o Jeremy Cohen <jeremy@dbtlabs.com>

automatic commit by git-black, original commits:
  6563d09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants