-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/3723 greedy flag test selection #3833
Feature/3723 greedy flag test selection #3833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick comments
- Really nice work on this one! You've managed to do this quite surgically.
- I agree that
dbt ls
should also support the--greedy
flag (as shoulddbt build
, dbt build: Parity for flags #3596). However,dbt ls
does something kinda funny & kinda good to swallow info-level logging, so that it only outputs resource types. TheSome resources were excluded
message will still appear, it will just be written tologs/dbt.log
instead of stdout. - As far as testing this: I added a whole new suite,
test/integration/066_test_selection_tests
, just for the change in Be less greedy in test selection expansion #3235. So I think you can add a new test case there, passing the greedy flag and saying "jk we want them all"!
longer-form thoughts
It might get somewhat tiresome to keep seeing the info message. The canonical way to do this would be by supporting both --greedy
and --no-greedy
flags. If the user passes --no-greedy
, that reinforces the default behavior and turns off the info logging. Ultimately, a bit of extra logging doesn't feel like a huge deal, and it's much better to show the message than show nothing and confuse lots of people (i.e. what we're doing today). Unless this speaks to you in a big way, I'd be happy to call that out of scope for the current PR, and welcome a new good first issue
+ contribution when another community member feels strongly :)
Another thought I had: More than passing an on/off flag, it feels like the most-intentional way to "lock in" any type of resource selection should be a yaml selector. Believe it or not, based on the way we've implemented this, it's already possible to set greediness within selectors, and at a finer level of granularity:
selectors:
- name: all_for_some_not_for_all
definition:
union:
- method: fqn
value: model_a
greedy: false
- method: fqn
value: model_b
greedy: true
There's one bit of extra logic we'd need, inside selection_criteria_from_dict
:
@classmethod
def selection_criteria_from_dict(
cls, raw: Any, dct: Dict[str, Any], greedy: bool = False
) -> 'SelectionCriteria':
if 'value' not in dct:
raise RuntimeException(
f'Invalid node spec "{raw}" - no search value!'
)
method_name, method_arguments = cls.parse_method(dct)
# is 'greedy' defined in the yaml selector definition?
# if so, use it
if 'greedy' in dct:
is_greedy = dct['greedy']
# the 'greedy' arg to this method is False by default
# we could edit cli.py to pass flags.GREEDY into that arg, if we want
# the --greedy flag to interact with yaml selectors, either by
# taking precedence or changing the default/fallback value
else:
is_greedy = greedy
parents_depth = _match_to_int(dct, 'parents_depth')
children_depth = _match_to_int(dct, 'children_depth')
return cls(
raw=raw,
method=method_name,
method_arguments=method_arguments,
value=dct['value'],
childrens_parents=bool(dct.get('childrens_parents')),
parents=bool(dct.get('parents')),
parents_depth=parents_depth,
children=bool(dct.get('children')),
children_depth=children_depth,
greedy=is_greedy
)
Questions:
- Should yaml selectors interact with the
--greedy
flag at all, or just always fall back to the default ofFalse
? - If they do respect the
greedy
flag, this logic would still prioritize anygreedy
values set inside the yaml selector, in version control, over the value set by the--greedy
flag. Does that seem like the right precedence order? I'm thinking about this like--full-refresh
+--store-failures
, which change the global default but should be preceded by any resource-specific settings (full_refresh
+store_failures
configs). - If they don't respect the
greedy
flag at all, and/or if they define greediness for all criteria, could we find a way to turn off the info logging about greediness if the user has specified--selector
? By using a yaml selector, the user is in in "full control" mode. This might require a bit of tricky plumbing, for an effect that's ultimately more cosmetic than functional, but it feels promising as an abstraction...
core/dbt/graph/cli.py
Outdated
@@ -66,7 +67,7 @@ def parse_union_from_default( | |||
def parse_difference( | |||
include: Optional[List[str]], exclude: Optional[List[str]] | |||
) -> SelectionDifference: | |||
included = parse_union_from_default(include, DEFAULT_INCLUDES) | |||
included = parse_union_from_default(include, DEFAULT_INCLUDES, greedy=flags.GREEDY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This captures all standard selection (--select
, --exclude
) perfectly.
To make the --greedy
flag work with a yaml --selector
too, we'd need to add a similar bit of logic that passes flags.GREEDY
into each method called by parse_from_definition
. That's fairly straightforward for the first and last conditions, which already accept greedy
as an argument:
SelectionCriteria.from_single_spec(definition, greedy=flags.GREEDY)
SelectionCriteria.selection_criteria_from_dict(definition, dct, greedy=flags.GREEDY)
(called byparse_dict_definition
) already acceptgreedy
as an argument
We'd need to add greedy
arg support for the methods parse_union_definition
and parse_intersection_definition
, which are called by the second and third conditions.
So: I'm thinking more about yaml selectors, and how they generally don't (and perhaps never should!) intersect with additional CLI-provided selection criteria. Also, there's the fact that they can define greedy
for themselves (!)... I'll say more in my overall review comment.
I don't use them so I'm happy to follow your lead. But I'm a sucker for precedent, so if none of the other CLI-provided selection criteria are respected then I lean no. Especially if I add support for Agreed that selectors trump flags if we did add support though.
I'll have a look at whether I can do something like |
OK I think I have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick fixes for the tests, and a few more comments :)
test/integration/066_test_selection_tests/test_selection_expansion.py
Outdated
Show resolved
Hide resolved
test/integration/066_test_selection_tests/test_selection_expansion.py
Outdated
Show resolved
Hide resolved
filtered_nodes = self.filter_selection(selected_nodes) | ||
|
||
if indirect_only: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a neat move to wrap this logging logic in another method, defined higher up, and then call it down here, similar to alert_non_existence
.
I also bring that up because I think the expect_exists
property (attached to BaseSelectionGroup
) could be a neat example for what we want to do around a raise_greedy_warning
, which would raise a warning for CLI-provided selection criteria, but not for yaml selector-provided selection criteria that define greedy: true|false
explicitly. The alert_non_existence
method is called if spec.expect_exists
here.
@jtcohen6 I've just had a stab at collating all of the discussion topics here, to make sure I've understood everything Still to do:
Not to do: Does that sound right to you? |
@joellabes Nice! That looks right to me. Thanks for keeping track of it all. At this point, I think the current PR is just about merge-able as is. If you can get all the code checks + unit tests passing, once you're happy with the integration test coverage for the Then, we can open and scope a separate issue to support greediness + yaml selectors. That will require some plumbing, but I do think the needed changes will be fairly specific + self-contained:
Also, I'd be okay with that functionality coming in a patch release or future version (v1.0). Does that sound ok to you? |
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
|
@jtcohen6 that split sounds great. Just to give you an update, I've been out of action this last week because it was my last few days at EP. Now I'm about to head off on holiday for a few days with no computer. I'm totally happy to close off these loose ends when I come back, but if you want to get the current work into .21 then I also won't be offended if you jump in. |
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
|
1936349
to
abc1ae5
Compare
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
|
2 similar comments
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Joel Labes.
|
@joellabes I think I resolved the outstanding TODOs, including support for the Congrats on wrapping up @ EP, and hope you enjoy your holiday! The CLA bot is having a tough time with the fact that you changed your email (I think?). I've manually checked to confirm that, yes indeed, you've signed our CLA :) I think this PR is just about good to go. I hope to get it in for v0.21.0-rc1. |
Aiii, looks like I need the CLA to pass in order to merge. @joellabes Apologies in advance: I'm going to rebase this PR, and scrub your now-defunct email from the commits. In no way do I mean to take credit for the hard work you've put in here. |
Add --greedy flag to subparser Add greedy flag, override default behaviour when parsing union Add greedy support to ls (I think?!) That was suspiciously easy Fix flake issues Try adding tests with greedy support Remove trailing whitespace Fix return type for select_nodes forgot to add greedy arg Remove incorrectly expected test Use named param for greedy Pull alert_unused_nodes out into its own function rename resources -> tests Add expand_selection explanation of --greedy flag Add greedy support to yaml selectors. Update warning, tests Fix failing tests Fix tests. Changelog Fix flake8
126ed56
to
e276d29
Compare
Add --greedy flag to subparser Add greedy flag, override default behaviour when parsing union Add greedy support to ls (I think?!) That was suspiciously easy Fix flake issues Try adding tests with greedy support Remove trailing whitespace Fix return type for select_nodes forgot to add greedy arg Remove incorrectly expected test Use named param for greedy Pull alert_unused_nodes out into its own function rename resources -> tests Add expand_selection explanation of --greedy flag Add greedy support to yaml selectors. Update warning, tests Fix failing tests Fix tests. Changelog Fix flake8 Co-authored-by: Joel Labes c/o Jeremy Cohen <jeremy@dbtlabs.com>
Ahhhh sorry! That final commit was actually some week-old unstaged work I found while cleaning out my laptop, but I'd already disassociated my work email from the account 😬 I'll fix that up when I'm back. Thanks for getting this over the line! (Especially since it was all the cleanup of my messes) |
Add --greedy flag to subparser Add greedy flag, override default behaviour when parsing union Add greedy support to ls (I think?!) That was suspiciously easy Fix flake issues Try adding tests with greedy support Remove trailing whitespace Fix return type for select_nodes forgot to add greedy arg Remove incorrectly expected test Use named param for greedy Pull alert_unused_nodes out into its own function rename resources -> tests Add expand_selection explanation of --greedy flag Add greedy support to yaml selectors. Update warning, tests Fix failing tests Fix tests. Changelog Fix flake8 Co-authored-by: Joel Labes c/o Jeremy Cohen <jeremy@dbtlabs.com> automatic commit by git-black, original commits: 6563d09
resolves #3723
Description
This restores the old behaviour for magic test expansion on an opt-in basis via a new
dbt test --greedy
flag. It also makes the new behaviour of only testing nodes where all parents are in scope more visible, by logging the tests that were excluded.Conversation topics
--store-failures
. I don't really understand what I did in the flags zone, so let me know if I'm off base.dbt ls
? I think yes - in my head,ls
is an interactive debugger before applying the selection logic and they should be equivalent. If so, should it only be allowed in conjunction with--resource-type test
?--selector
? I think not, you'd pass--greedy
in separately?--i-know-about-greedy-leave-me-alone
flag every time is equally tedious. Maybe an environment variable a laDBT_PROFILES_DIR
?Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.