Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Include sources in dbt list -s "fqn:*" #9692

Open
3 tasks done
dbeatty10 opened this issue Feb 28, 2024 · 4 comments
Open
3 tasks done

[Feature] Include sources in dbt list -s "fqn:*" #9692

dbeatty10 opened this issue Feb 28, 2024 · 4 comments
Labels
Impact: CA list related to the dbt list command stale Issues that have gone stale

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Feb 28, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

User story

As a developer on a dbt project, I sometimes want to define a selector in terms of "include everything except for ..." so that it is easy to write and includes precisely the desired nodes.

Known examples

One use case is defining a series of selectors that partition a dbt project. To make sure that everything is covered, the final selector would be defined as "everything that isn't one of the previously defined selectors".

Proposed solution

The easiest way to fulfill the user story above is to have a selection method that will select "all nodes". The most natural way to do that would be via "fqn:*" (as long as all node / resource types are included).

Describe the feature

When running dbt list -s "fqn:*", include all sources in the output.

For example, suppose I have project files like described in dbt-labs/docs.getdbt.com#4492 (comment).

If I have the following source definition within models/_sources.yml, then I'd expect to be able to use the fqn method to select it.

sources:
  - name: my_src
    database: "{{ target.database }}"
    schema: "{{ target.schema }}"
    tables:
      - name: my_seed

Describe alternatives you've considered

Currently, sources are not included by the fqn method like this:

dbt list -s "fqn:*"

Output:

01:09:56  Running with dbt=1.7.8
01:09:57  Registered adapter: postgres=1.7.8
01:09:57  Found 1 seed, 1 snapshot, 2 models, 1 analysis, 1 test, 1 source, 1 exposure, 1 metric, 401 macros, 1 group, 1 semantic model
exposure:my_project.my_exposure
metric:my_project.my_metric
my_project.metricflow_time_spine
my_project.my_model
my_project.my_seed
semantic_model:my_project.my_semantic_model
my_project.my_snapshot.my_snapshot
my_project.not_null_my_model_id

However, they are included in the output of this command:

dbt list --resource-types all

Output:

01:10:31  Running with dbt=1.7.8
01:10:32  Registered adapter: postgres=1.7.8
01:10:32  Found 1 seed, 1 snapshot, 2 models, 1 analysis, 1 test, 1 source, 1 exposure, 1 metric, 401 macros, 1 group, 1 semantic model
my_project.analysis.my_analysis
exposure:my_project.my_exposure
metric:my_project.my_metric
my_project.metricflow_time_spine
my_project.my_model
my_project.my_seed
semantic_model:my_project.my_semantic_model
my_project.my_snapshot.my_snapshot
source:my_project.my_src.my_seed
my_project.not_null_my_model_id

Who will this benefit?

Here's an example of creating a default to selector that is meant to include everything except certain models:

#9678 (comment)

The user would like to use fqn:* to start with "everything" and then add specific exclusions from there.

Are you interested in contributing this feature?

No response

Anything else?

See also: #9693

Related internal Slack thread: https://dbt-labs.slack.com/archives/C05FWBP9X1U/p1709217641798779

@aranke
Copy link
Member

aranke commented Feb 29, 2024

Potential fix:

It looks like sources are not included in the search strategy for the FQN selector:

def search(self, included_nodes: Set[UniqueId], selector: str) -> Iterator[UniqueId]:
"""Yield all nodes in the graph that match the selector.
:param str selector: The selector or node name
"""
non_source_nodes = list(self.non_source_nodes(included_nodes))
for node, real_node in non_source_nodes:
if self.node_is_match(selector, real_node.fqn, real_node.is_versioned):
yield node

I can change this to all_nodes which should include sources.

@graciegoheen
Copy link
Contributor

graciegoheen commented Mar 11, 2024

From internal Slack:

Sources have never been included in fqn:*, because they are selected as source:* instead. Only models/seeds/snapshots/tests are included by fqn.

@jtcohen6 do you know why ^?

That’s why the “default” node selection is so verbose.

Starting in 1.7, docs generate respects the node selection. So if there is a default yaml selector defined, that will now apply to the docs generate step too.

More context here

@dbeatty10
Copy link
Contributor Author

I think we should add sources (and analyses) to fqn:*. Reasons below.

Research

I've only been able to find two resource types that are not included by dbt list -s "fqn:*":

  1. sources
  2. analyses

Reprex

  1. Start with these project files
  2. Run dbt list -s "fqn:*"
    • 👉 Notice that exposures, semantic_models, and metrics are included (but sources and analyses are not)
  3. Then run dbt list --resource-types all
    • Notice that sources and analyses are included

Additional context

Quoting @jtcohen6 from #8589 (comment):

It does feel like there's a real opportunity for refactoring here. It feels odd that sources/exposures/semantic_models/metrics are "pointer" node types, as opposed to the "logical" node types (models/seeds/snapshots/tests/analyses), and only those are included by the fqn:* selection.

But I think that's all out of scope for something we want to backport to v1.6!

Including sources within fqn:*

Pros

It seems like we can add sources (and analyses) to fqn:* without users losing any flexibility:

Cons

Are there any negative consequences to including sources in fqn:*?

I don't know of any, but I could be overlooking something.

Follow-up refactoring opportunity

If we make it so that fqn:* includes all resource types, then we might also be able to simplify this:

DEFAULT_INCLUDES: List[str] = ["fqn:*", "source:*", "exposure:*", "metric:*", "semantic_model:*"]

to this:

DEFAULT_INCLUDES: List[str] = ["fqn:*"]

As it currently stands, "exposure:*", "metric:*", "semantic_model:*" might already be unnecessary.

@graciegoheen graciegoheen removed this from the v1.8 milestone Mar 21, 2024
@dbeatty10 dbeatty10 added the list related to the dbt list command label Apr 18, 2024
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Impact: CA list related to the dbt list command stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants