-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Psych does not complain about duplicate keys #79
Comments
Well, Psych maintains the behavior that Syck had for this. Until now, nobody has had a problem with the way it works. I don't mind changing it, but it will not be backwards compatible. |
Got it. I don't suppose Psych already has a place for configuration options, does it? I don't see one. Would that be a desirable approach? Then, for example, a Rails app in development mode could be more strict/fragile about the YAML than a production deployment. |
It's been about 2 years since this was brought up. Let's do it. |
So according to this tweet[0], we should add an option to enforce this behaviour (maybe the option is called 'strict'). In my opinion, with this option enabled we should raise a Psych has a number of interfaces that we would want this behaviour added to (cribbed from the docs[2]):
Right now none of these methods take an options hash. We can expand the signature with an optional options hash. I think this the right course of action, does anybody have any other suggestion? Also, consider that the generation of YAML already has an options hash (in [0] - https://twitter.com/tenderlove/status/468781805940641793 |
Basically I'm proposing changing from this: Psych.parse(yaml, filename = nil) to this: Psych.parse(yaml, filename = nil, options = {}) (and on all the other methods listed above) |
@Nitrodist 👍 I think it's good. For a major release, I think we should change the filename to be an option like |
We're having a hacknight tomorrow in Toronto. I'm hoping to work on a solution with @nwjsmith and have it ready by Thursday. 👐 |
@Nitrodist What was the outcome of your hacking night? 😄 |
@coin3d I was able to get it to 'work' but its efficiency left much to be desired. This is what I had: diff --git lib/psych/handlers/document_stream.rb lib/psych/handlers/document_stream.rb
index e429993..bf04896 100644
--- lib/psych/handlers/document_stream.rb
+++ lib/psych/handlers/document_stream.rb
@@ -17,6 +17,18 @@ module Psych
@last.implicit_end = implicit_end
@block.call pop
end
+
+ def end_mapping
+ mapping = pop
+ keys = []
+ mapping.children.each_slice(2) do |(key_scalar, _)|
+ next if key_scalar.is_a?(Psych::Nodes::Sequence) or key_scalar.is_a?(Psych::Nodes::Alias) or key_scalar.is_a?(Psych::Nodes::Mapping)
+ key = key_scalar.value
+ raise Psych::Exception, "Same key exists on this level" if keys.include? key
+ keys << key
+ end
+ mapping
+ end
end
end
end
diff --git test/psych/test_hash.rb test/psych/test_hash.rb
index dac7f8d..b3ecbc0 100644
--- test/psych/test_hash.rb
+++ test/psych/test_hash.rb
@@ -21,6 +21,16 @@ module Psych
assert_equal X, x.class
end
+ def test_error_on_same_key
+ assert_raises(Psych::Exception) do
+ Psych.load <<-EOF
+ -
+ same_key: 'value'
+ same_key: 'value'
+ EOF
+ end
+ end
+
def test_self_referential
@hash['self'] = @hash
assert_cycle(@hash)
|
I would have liked a warning or error message like this... I now used https://github.com/adrienverge/yamllint for checking for duplicate keys. |
What is the state of this ? I don't really mind or care about the default but there should be a way to raise an error on duplicate keys, the consequences of a duplicate key are rather harsh since the first one will simply be entirely overwritten. |
If you use a hash instead of a list then the performance shouldn't suffer nearly as much:
any takers for testing and a PR? |
Is this likely to be implemented any time soon? |
@tenderlove you mentioned backwards compatibility... |
Just wasted 2 days investigating this very issue after a developer accidentally forgot to rename a job that was copy/pasted. Any thoughts on just printing a message as @wteuber suggests? Would save a lot of time - especially when multiple developers are working on automation. |
I helped programmatically avoiding this issue by writing and introducing a normalized YAML format in the projects I am involved in. Maybe you find this helpful, too: https://github.com/Sage/yaml_normalizer |
Travis CI user has had a problem because of the current behavior: https://travis-ci.community/t/build-no-longer-installing-apt-packages-ignoring-config/4306 |
http://www.yamllint.com/ also has the same problem: it doesn't report duplicate keys and instead silently deletes all but the last one and reports YAML as "valid". |
Since multiple keys will likely be supported by default, permanently, I think there should be a guarantee about the order they will be processed. It seems to use a "first come, first served" model rather than an override model, i.e. the first key encountered is the one that's used, with subsequent duplicates being discarded rather than acting as overrides. Would it make sense to document this and add a test to ensure it remains stable? While I realise it's not strictly YAML-compliant, it's actually useful for merging YAMLs from multilpe sources, so if it's in the library anyway, may as well make it work deterministically. |
That this is not an error is problematic, especially in the context of merge keys. Gitlab (working with Ruby for YAML) seems to allow this because of this bug, and the effect when mulitple mapped in merges have the same key (but different values) was allowed but impossible for me to trace what it means. I.e. is
the same as FWIW, PyYAML overwrites a key's value with any value with the same that occur later in the document text in a mapping. |
I strongly recommend to implement this issue because I encountered issue related to this implementation. Related issueI encountered following issue.
Reasons to implement
|
As pretty much anyone that has arrived here, I also have been bitten by this default behavior. Working with a large Rails yml translation file, I didn't realize I declared a key that was already present in a lower part in the same file. Couldn't understand why it wasn't loading properly. Took me a long time debugging the ActiveModel and i18n source code directly until I finally realized my mistake 😕 I'd love to see this feature implemented, either with a warning or an error, and IMHO Rails should even default to raising an error for i18n translation files in the future. |
I helped programmatically avoiding this issue by introducing normalized YAML files in projects I am involved in. Maybe you find this helpful, too: https://github.com/Sage/yaml_normalizer |
Psych does not complain or raise an error when it encounters a duplicate key. Instead, it happily overwrites the previous value of the key. This is in contrast to the YAML spec, which maintains that the keys must be unique.
Agree/disagree that this is a problem?
The text was updated successfully, but these errors were encountered: