-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-520] [Enhancement] Schema files with repeated top level keys (e.g. models
) results in only the last key being parsed
#5114
Comments
models
) results in only the last key being parsedmodels
) results in only the last key being parsed
Here's the relevant issue and a fix we could try which extends Add the extended class to our dbt-core/core/dbt/clients/yaml_helper.py Line 57 in 37b8b65
Like so: except (yaml.scanner.ScannerError, yaml.YAMLError, ValueError) as e: Then our
|
(Maybe a warning instead of an error, for starters? Then we could switch it to error-level, in a future minor version) |
@jeremyyeo thanks for this amazing write up and suggested solution! I agree with @jtcohen6 that a warning would be a good place to start here. It would need to be caught separate from the
Then define that new exception in Additional it would be useful to add a test that checks we trigger a warning as expected. If/when we convert to an error, the test would need to be updated. All of our duplication tests live in 025_duplicate_model_tests but we are currently in the process of converting all of our tests to a more intuitive and consistent framework. The new test would live in If you or another community member want to work on this I would be happy to help give some direction on writing the new test and any other questions there may be. |
At this point we've reverted the original pull request because we didn't have time to figure out a solution for anchor overrides , which were causing error messages for duplicate keys, even though anchor overrides are legal syntax and should not be considered duplicate keys. It's still possible that we could come up with a solution, but it requires more research. Some possibilities are 1) fixing in pyyaml and doing a pull request. 2) supporting a different yaml library that does correctly handle duplicate keys (ruamel has the same issue with spurious errors for anchor overrides). 3) expanding the code that looked for duplicate keys to handle anchor overrides correctly. One possibility is yaml.parse, which does provide information on anchors and such. A fix using this would quite possibly make more sense as a pyyaml pr though. https://stackoverflow.com/questions/64460249/how-to-remove-duplicate-keys-in-yaml-file-automatically. yaml.parse returns information on anchors, so the linked code could possibly be modified to skip duplicates for anchor overrides. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
models
) results in only the last key being parsedmodels
) results in only the last key being parsed
Hi @gshank just re-opened this issue as it's still unresolved and would be really helpful if we could potentially implement your suggestion maybe. Just sharing an additional example of how this could impact users: If you have a
With the above, you basically lose table Expected Behavior: |
Is there an existing issue for this?
Current Behavior
If you have a schema file with repeated top level keys:
Only the last
models
key will apply to the project when you execute dbt tasks that depend on them (liketest
,docs generate
). We should raise an exception for such cases.Expected Behavior
Perhaps we should raise an error in this case instead of silently just including the last
models
key of the schema file in our project's manifest.Steps To Reproduce
schema.yml
file to your project along with 2 simple models:dbt run
to build our models.dbt test
and observe that we only testedmy_model_b
.dbt docs generate && dbt docs serve
and observe that onlymy_model_b
has a description (my_model_a
has no description).Relevant log output
Environment
What database are you using dbt with?
postgres
Additional Context
The text was updated successfully, but these errors were encountered: