-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicated keys in parameters or catalog should raise an error or a warning #825
Comments
Hi @Galileo-Galilei thanks for raising this issue - it's a bit of a funny one and I agree something that it would be a pain to debug. Currently YAML parsing is delegated to the Looking at the YAML spec there has been some evolution on the topic between 1.1 and 1.2 and the semantics of 'should' versus 'must' be unique. In PyCharm the syntax highlighter raises this as an issue, but it will still be marked as valid on this online YAML validator. We'd welcome a pull request improving the UX of this - we use |
Thank you for the quick reply @datajoely. I quickly browse the web and surprisingly it seems that That said, I don't have time right now to dive into Should we keep the issue opened for reference or do you prefer I close it since it will likely not be fixed? |
I think it's something that we will likely not fix on Kedro core side - so I think it's best to close and perhaps raise an issue on the |
I understand you will not support any specific trick in the core library to fix it soon, but it may be something to have in mind when you will be working on configuration refactoring as described here because it will affect some of the features described in this thread (like enabling a non destructive merge of parameters to supercharge nested structure in the parameters.yml file), don't you think so? |
Description
When a configuration file (e.g. parameters.yml) has duplicated entries within the same environment (e.g base/), Kedro keeps only the last entry it encounters without raising any warning.
Context
I was struggling to debug a big kedro project written by someone else, with several big parameters files (the projects had hundreds of parameters). Regardless of whether this fits or not into "best practices", I noticed the following (weird, and likely incorrect) behaviour of kedro: some parameters had duplicated entries (mainly because the number was so high and the file was poorly organized, so these duplicated entries were a mistake of the original developer), and only the last entry was loaded.
I had a hard time debugging this, because when I was changing some parameters in the config file, there was no change when running the pipeline (and no warning at all). My error was that I was changing the first key, but a duplicated key elsewhere in the file (or even in another
parameters_second_file.yml
) was overriding it.Note that will happen with all files loaded with the
ConfigLoader
, and I faced the same issue with credentials : the project had two entries with the same credentials name, kedro loaded the 2nd one which was unfortunately incorrect, and I struggled a lot to figure out why I could not connect to the database because Kedro kept complaining about wrong credentials, but the file I was looking at had the right ones.Steps to Reproduce
Enter "configloader dupkeys" and type enter 3 times.
conf/base/parameters
, create a duplicated entry:Expected Result
When the ConfigLoader loads a file withith duplicated keys in the same environment, it should either:
Actual Result
The first key is ignored without any information. This is not harmful in the toy example above, but becomes a debugging nightmare in projects with a lot of config files and a lot of keys in these config files.
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
pip show kedro
orkedro -V
): tried with 0.16.5 and 0.17.4python -V
): 3.6.8The text was updated successfully, but these errors were encountered: