[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch #6598

z3z1ma · 2023-01-12T23:04:28Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Synopsis

Support a new selector method which leverages fnmatch from the python stdlib. This function matches a string against a pattern that uses unix wildcard syntax. Unix wildcard syntax includes wildcards (*), ? (one of any character), and [a-z][1-9] range expressions. Using this syntax against the models fqn allows an innumerable amount of more creative --select uses.

It is entirely opt-in. No existing behavior is changed. You can use this by prefixing wildcard to a select like this:

wildcard:jaffle_*.staging.*_customer

Unlocks

This is a significant improvement to test direct selection. you can now test specific columns across an entire section of the graph.

One example:

dbt test -s 'wildcard:*salesforce*ccm_cloud_spend*'

This will run all tests that contains salesforce and ccm_cloud_spend in the fqn. No tag wrangling required.

The same benefits apply to any workflow where users need to select something with more flexibility than what is currently possible. Some complex selectors may also be able to be simplified.

Run everything upstream of any model ending with _arr

dbt run -s '+wildcard:*_arr'

Conclusion

There are theoretically no downsides since its a net new feature that is fully opt-in.

Describe alternatives you've considered

This is painful to write, but here:

dbt list > dbt_resources
dbt test --select $(grep '*expression_is_true*') $(grep '*expect_column_count*') $(grep '*intermediate.folder_[a-z]_*')

What I have proposed is far more dynamic than anything I can conjure up.

Who will this benefit?

Everyone who needs more control in how they select their resources. Especially benefiting direct test selection which is really painful right now; but we could rattle out use cases all day given the flexibility of this selector.

Are you interested in contributing this feature?

Yes

Anything else?

dbt docs PR
dbt-labs/docs.getdbt.com#2702

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2023-01-17T15:36:30Z

This is a cool idea @z3z1ma ! And I'm pleasantly surprised how short and sweet the implementation in #6599 looks.

Questions for you and @jtcohen6:

what would the trade-offs be of the wildcard stand-alone selector method you are suggesting versus adding this capability to all the existing selector methods?
would would be the trade-offs of enabling regular expressions instead of Unix-style glob patterns?

z3z1ma · 2023-01-17T16:24:11Z

@dbeatty10 my take!

what would the trade-offs be of the wildcard stand-alone selector method you are suggesting versus adding this capability to all the existing selector methods?

Originally I would have liked to update the default fqn selector (QualifiedNameSelectorMethod). To me it was less flexible and powerful to interpret a "selector glob", as its referred to, as meaning we want to match all resources which share the preceding segments. I'll share the existing fqn snippet here for inline context.

    for i, selector_part in enumerate(node_selector.split(".")):
        # if we hit a GLOB, then this node is selected
        if selector_part == SELECTOR_GLOB:
            return True
        elif flat_fqn[i] == selector_part:
            continue
        else:
            return False

This is not as flexible as the fnmatch approach. Problem is, as one could expect, is that it would have been a breaking change because existing users who figure out how the selector glob works will invariably rely on it. See Hyrum's Law

As far as adding it elsewhere... My point above remains relevant on the QualifiedNameSelectorMethod (default) selector. But for all the others I am happy to spitball thoughts.

SourceSelectorMethod: This selector could use fnmatch without breaking the existing API for users. The SELECTOR_GLOB is used as a wildcard in a way that is similar to how one would with fnmatch. Again, less flexible; but in this particular case, if I plugged in fnmatch, existing selectors using the glob would work as expected. source:salesforce.* or source:*.table_a would be equivalent using the existing implementation or fnmatch. We would actually reduce the code footprint too.

Proof of existing test cases passing with minimal changes.

ExposureSelectorMethod: Same sentiment as above. The code is almost identical.

MetricSelectorMethod: Same as above.

TagSelectorMethod: This selector never used the SELECTOR_GLOB, so adding fnmatch there would also be a non breaking change and a solid value add IMO.

FileSelectorMethod: Same as tag selector, no SELECTOR_GLOB so it would be net-new functionality to support it and also a solid value add IMO.

PackageSelectorMethod: Same as above.

PathSelectorMethod: Already uses Path.glob so no change.

would would be the trade-offs of enabling regular expressions instead of Unix-style glob patterns?

The unix-style wildcard syntax is significantly simpler than regex. Furthermore I cannot think of any selection criteria that could not be expressed with unix-style wildcard syntax yet I could think of innumerable "weird" things and issues to troubleshoot for users if they are trying to use regular expressions. No reason for matching groups, etc. There is more mileage here in simplicity.

It is also worth mentioning . has meaning in regex (matches anything) which is different than how a user might intuit reading it when FQNs use . separators. This means these are equivalent but we can tell which is better.

dbt ls -s 'regex:jaffle_.*\..*\.customer'
dbt ls -s 'wildcard:jaffle_*.*.customer'

z3z1ma · 2023-01-17T17:04:52Z

After caffeinating. I am thinking that even changing the qualified name selector (the default) to use fnmatch would actually not be a breaking change since the glob will work identically once encountered as selecting all resources sharing the preceding segments. 😄

dbeatty10 · 2023-01-17T18:24:47Z

Awesome analysis @z3z1ma !

Based on your findings, I'd be inclined to add your fnmatch implementation to all the pre-existing selector methods: it seems like it might be a nice enhancement to existing selector methods without introducing a breaking change or the more complicated syntax of full-blown regular expressions.

@jtcohen6 you may have a different read here or things we haven't yet considered -- will be interested to hear your thoughts and feedback.

z3z1ma · 2023-01-19T00:09:43Z

@dbeatty10

Support added for all the "string matching" based selectors:

MetricSelectorMethod
FileSelectorMethod
PackageSelectorMethod
QualifiedNameSelectorMethod
SourceSelectorMethod
TagSelectorMethod
TestNameSelectorMethod

There is no logical purpose for the fnmatch in any of the other methods. But the above can all be augmented.

Still a non-breaking change and net new functionality for all of these, possibilities include:

metric:arr_*
file:stg_salesforce__*.sql
package:internal_pkg_*
source:salesforce.*
tag:*_pii
test_name:singular_test_suite_[1-9]

Once @jtcohen6 reviews, I'll remove my Wildcard selector since the functionality is embedded in the default selector. I will rename the test case which comprises more advanced patterns, test_select_advanced_fqn or some such. @jtcohen6 I think you will be pleasantly surprised to see how we maintain backwards compatibility while very naturally extending the capabilities present in the default selector.

jtcohen6 · 2023-01-19T12:17:50Z

@z3z1ma This is awesome !! You're right, I am pleasantly surprised with how naturally this change fits into the existing code, and the existing user experience (fqn:*, source:*, exposure:*).

Fully agree with:

Using unix-style wildcards and not full-on regex
Embedding this capability within existing selection methods. There's no need for a totally new one, this can just become part of our selection syntax, alongside other meaningful characters (+, @, ,)

I'm going to remove Refinement here, and mark the PR ready_for_review by someone on the appropriate engineering team.

resolves #6598

resolves dbt-labs#6598

z3z1ma added enhancement New feature or request triage labels Jan 12, 2023

github-actions bot changed the title ~~[Feature] Unix-style wildcard fqn selector method via fnmatch~~ [CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch Jan 12, 2023

z3z1ma mentioned this issue Jan 12, 2023

[Feature] Add unix-style fqn wildcard selector method #6599

Merged

6 tasks

dbeatty10 added Team:Language awaiting_response and removed triage labels Jan 17, 2023

github-actions bot added triage and removed awaiting_response labels Jan 17, 2023

jtcohen6 self-assigned this Jan 17, 2023

dbeatty10 added the Refinement Maintainer input needed label Jan 17, 2023

jtcohen6 added Team:Execution and removed triage Team:Language Refinement Maintainer input needed labels Jan 19, 2023

jtcohen6 removed their assignment Jan 19, 2023

jtcohen6 added node selection Functionality and syntax for selecting DAG nodes help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors labels Jan 19, 2023

dbeatty10 mentioned this issue Feb 8, 2023

[CT-2059] [Feature] Syntax to restrict selection to the current package #6891

Closed

3 tasks

aranke closed this as completed in #6599 Mar 5, 2023

aranke pushed a commit that referenced this issue Mar 5, 2023

[Feature] Add unix-style fqn wildcard selector method (#6599)

24ca76e

resolves #6598

acurtis-evi pushed a commit to acurtis-evi/dbt-core that referenced this issue Mar 7, 2023

[Feature] Add unix-style fqn wildcard selector method (dbt-labs#6599)

50b40dc

resolves dbt-labs#6598

dbeatty10 mentioned this issue Apr 13, 2023

[CT-2407] [Bug] dbt can't find snapshots using their filenames if mixed cased but only if inside a subfolder #7346

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch #6598

[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch #6598

z3z1ma commented Jan 12, 2023 •

edited

Loading

dbeatty10 commented Jan 17, 2023

z3z1ma commented Jan 17, 2023 •

edited

Loading

z3z1ma commented Jan 17, 2023 •

edited

Loading

dbeatty10 commented Jan 17, 2023

z3z1ma commented Jan 19, 2023 •

edited

Loading

jtcohen6 commented Jan 19, 2023

[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch #6598

[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch #6598

Comments

z3z1ma commented Jan 12, 2023 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Synopsis

Unlocks

Conclusion

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

dbeatty10 commented Jan 17, 2023

z3z1ma commented Jan 17, 2023 • edited Loading

z3z1ma commented Jan 17, 2023 • edited Loading

dbeatty10 commented Jan 17, 2023

z3z1ma commented Jan 19, 2023 • edited Loading

jtcohen6 commented Jan 19, 2023

z3z1ma commented Jan 12, 2023 •

edited

Loading

z3z1ma commented Jan 17, 2023 •

edited

Loading

z3z1ma commented Jan 17, 2023 •

edited

Loading

z3z1ma commented Jan 19, 2023 •

edited

Loading