Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass an (optional) list of table-names to generate_source #50

Closed
rahulj51 opened this issue Nov 21, 2021 · 3 comments · Fixed by #51
Closed

Pass an (optional) list of table-names to generate_source #50

rahulj51 opened this issue Nov 21, 2021 · 3 comments · Fixed by #51
Labels
enhancement New feature or request

Comments

@rahulj51
Copy link
Contributor

Describe the feature

Sometimes, if the schema has several tables, one wants generate_schema to generate schema definitions for a selected set of tables only. Currently, this is not supported.

Provide an (optional) parameter to generate_source called table_names that takes a list of tables. If provided, generate_schema will generate the definitions for only these tables.

Describe alternatives you've considered

No alternatives exist except for manually removing the unwanted lines from the generated definitions.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.
No.

Who will this benefit?

What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
In cases where schemas contain 100s of tables but only a few of them are of interest.

Are you interested in contributing this feature?

Yes.

@rahulj51 rahulj51 added the enhancement New feature or request label Nov 21, 2021
@rahulj51 rahulj51 mentioned this issue Nov 21, 2021
7 tasks
@pixie79
Copy link

pixie79 commented Jan 27, 2022

Can we get this merged? - #51

It appears to be very useful, especially for automated pipelines. I could use it to add new sources as part of a loading pipeline. Hopefully dbt is happy with multiple files describing the source, either that or we would need to merge them somehow.

@rahulj51
Copy link
Contributor Author

rahulj51 commented Mar 8, 2022

Hi, any chance if this making it to a release?

@dbeatty10
Copy link
Contributor

Thank you @rahulj51 for implementing this feature 👏 #51 is merged and will be included in the next release.

@pixie79 thank you for affirming the value for your use-case and bringing up the topic of merging YAML files.

There's two main options for describing the source:

  1. Multiple files
  2. A single file

I think 1) is a legit option since dbt allows multiple YAML files with source info as long as the model names aren't duplicated.

However, 2) is probably the most common case, and @pixie79 correctly pointed out that they need to be merged somehow.

The merging solution I'm most attracted to is a some kind of Python script that can "full outer join" two YAML files together (and/or other types of join behavior). The thing that gives me pause:

  • The two YAML libraries I've used before (ruamel.yaml and PyYAML) each left something to be desired and would possibly cause such a script to exhibit buggy behavior for the first N iterations (where N is an uncomfortably large number).

Until the magical day that such a script exists, the only remaining option is to do the merge by hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants