Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch #6598

Closed
3 tasks done
z3z1ma opened this issue Jan 12, 2023 · 6 comments · Fixed by #6599
Closed
3 tasks done

[CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch #6598

z3z1ma opened this issue Jan 12, 2023 · 6 comments · Fixed by #6599
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes

Comments

@z3z1ma
Copy link
Contributor

z3z1ma commented Jan 12, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Synopsis

Support a new selector method which leverages fnmatch from the python stdlib. This function matches a string against a pattern that uses unix wildcard syntax. Unix wildcard syntax includes wildcards (*), ? (one of any character), and [a-z][1-9] range expressions. Using this syntax against the models fqn allows an innumerable amount of more creative --select uses.

It is entirely opt-in. No existing behavior is changed. You can use this by prefixing wildcard to a select like this:

wildcard:jaffle_*.staging.*_customer

Unlocks

This is a significant improvement to test direct selection. you can now test specific columns across an entire section of the graph.

One example:

dbt test -s 'wildcard:*salesforce*ccm_cloud_spend*'

This will run all tests that contains salesforce and ccm_cloud_spend in the fqn. No tag wrangling required.

The same benefits apply to any workflow where users need to select something with more flexibility than what is currently possible. Some complex selectors may also be able to be simplified.

Run everything upstream of any model ending with _arr

dbt run -s '+wildcard:*_arr'

Conclusion

There are theoretically no downsides since its a net new feature that is fully opt-in.

Describe alternatives you've considered

This is painful to write, but here:

dbt list > dbt_resources
dbt test --select $(grep '*expression_is_true*') $(grep '*expect_column_count*') $(grep '*intermediate.folder_[a-z]_*')

What I have proposed is far more dynamic than anything I can conjure up.

Who will this benefit?

Everyone who needs more control in how they select their resources. Especially benefiting direct test selection which is really painful right now; but we could rattle out use cases all day given the flexibility of this selector.

Are you interested in contributing this feature?

Yes

Anything else?

dbt docs PR
dbt-labs/docs.getdbt.com#2702

@z3z1ma z3z1ma added enhancement New feature or request triage labels Jan 12, 2023
@github-actions github-actions bot changed the title [Feature] Unix-style wildcard fqn selector method via fnmatch [CT-1811] [Feature] Unix-style wildcard fqn selector method via fnmatch Jan 12, 2023
@dbeatty10
Copy link
Contributor

This is a cool idea @z3z1ma ! And I'm pleasantly surprised how short and sweet the implementation in #6599 looks.

Questions for you and @jtcohen6:

  • what would the trade-offs be of the wildcard stand-alone selector method you are suggesting versus adding this capability to all the existing selector methods?
  • would would be the trade-offs of enabling regular expressions instead of Unix-style glob patterns?

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jan 17, 2023

@dbeatty10 my take!

what would the trade-offs be of the wildcard stand-alone selector method you are suggesting versus adding this capability to all the existing selector methods?

Originally I would have liked to update the default fqn selector (QualifiedNameSelectorMethod). To me it was less flexible and powerful to interpret a "selector glob", as its referred to, as meaning we want to match all resources which share the preceding segments. I'll share the existing fqn snippet here for inline context.

    for i, selector_part in enumerate(node_selector.split(".")):
        # if we hit a GLOB, then this node is selected
        if selector_part == SELECTOR_GLOB:
            return True
        elif flat_fqn[i] == selector_part:
            continue
        else:
            return False

This is not as flexible as the fnmatch approach. Problem is, as one could expect, is that it would have been a breaking change because existing users who figure out how the selector glob works will invariably rely on it. See Hyrum's Law

As far as adding it elsewhere... My point above remains relevant on the QualifiedNameSelectorMethod (default) selector. But for all the others I am happy to spitball thoughts.

SourceSelectorMethod: This selector could use fnmatch without breaking the existing API for users. The SELECTOR_GLOB is used as a wildcard in a way that is similar to how one would with fnmatch. Again, less flexible; but in this particular case, if I plugged in fnmatch, existing selectors using the glob would work as expected. source:salesforce.* or source:*.table_a would be equivalent using the existing implementation or fnmatch. We would actually reduce the code footprint too.

Proof of existing test cases passing with minimal changes.
image

ExposureSelectorMethod: Same sentiment as above. The code is almost identical.

MetricSelectorMethod: Same as above.

TagSelectorMethod: This selector never used the SELECTOR_GLOB, so adding fnmatch there would also be a non breaking change and a solid value add IMO.

FileSelectorMethod: Same as tag selector, no SELECTOR_GLOB so it would be net-new functionality to support it and also a solid value add IMO.

PackageSelectorMethod: Same as above.

PathSelectorMethod: Already uses Path.glob so no change.


would would be the trade-offs of enabling regular expressions instead of Unix-style glob patterns?

The unix-style wildcard syntax is significantly simpler than regex. Furthermore I cannot think of any selection criteria that could not be expressed with unix-style wildcard syntax yet I could think of innumerable "weird" things and issues to troubleshoot for users if they are trying to use regular expressions. No reason for matching groups, etc. There is more mileage here in simplicity.

It is also worth mentioning . has meaning in regex (matches anything) which is different than how a user might intuit reading it when FQNs use . separators. This means these are equivalent but we can tell which is better.

dbt ls -s 'regex:jaffle_.*\..*\.customer'
dbt ls -s 'wildcard:jaffle_*.*.customer'

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jan 17, 2023

After caffeinating. I am thinking that even changing the qualified name selector (the default) to use fnmatch would actually not be a breaking change since the glob will work identically once encountered as selecting all resources sharing the preceding segments. 😄

@jtcohen6 jtcohen6 self-assigned this Jan 17, 2023
@dbeatty10 dbeatty10 added the Refinement Maintainer input needed label Jan 17, 2023
@dbeatty10
Copy link
Contributor

Awesome analysis @z3z1ma !

Based on your findings, I'd be inclined to add your fnmatch implementation to all the pre-existing selector methods: it seems like it might be a nice enhancement to existing selector methods without introducing a breaking change or the more complicated syntax of full-blown regular expressions.

@jtcohen6 you may have a different read here or things we haven't yet considered -- will be interested to hear your thoughts and feedback.

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jan 19, 2023

@dbeatty10

Support added for all the "string matching" based selectors:

MetricSelectorMethod
FileSelectorMethod
PackageSelectorMethod
QualifiedNameSelectorMethod
SourceSelectorMethod
TagSelectorMethod
TestNameSelectorMethod

There is no logical purpose for the fnmatch in any of the other methods. But the above can all be augmented.


Still a non-breaking change and net new functionality for all of these, possibilities include:

metric:arr_*
file:stg_salesforce__*.sql
package:internal_pkg_*
source:salesforce.*
tag:*_pii
test_name:singular_test_suite_[1-9]

Once @jtcohen6 reviews, I'll remove my Wildcard selector since the functionality is embedded in the default selector. I will rename the test case which comprises more advanced patterns, test_select_advanced_fqn or some such. @jtcohen6 I think you will be pleasantly surprised to see how we maintain backwards compatibility while very naturally extending the capabilities present in the default selector.

@jtcohen6
Copy link
Contributor

@z3z1ma This is awesome !! You're right, I am pleasantly surprised with how naturally this change fits into the existing code, and the existing user experience (fqn:*, source:*, exposure:*).

Fully agree with:

  • Using unix-style wildcards and not full-on regex
  • Embedding this capability within existing selection methods. There's no need for a totally new one, this can just become part of our selection syntax, alongside other meaningful characters (+, @, ,)

I'm going to remove Refinement here, and mark the PR ready_for_review by someone on the appropriate engineering team.

@jtcohen6 jtcohen6 added Team:Execution and removed triage Team:Language Refinement Maintainer input needed labels Jan 19, 2023
@jtcohen6 jtcohen6 removed their assignment Jan 19, 2023
@jtcohen6 jtcohen6 added node selection Functionality and syntax for selecting DAG nodes help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors labels Jan 19, 2023
acurtis-evi pushed a commit to acurtis-evi/dbt-core that referenced this issue Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants