Check for duplicate dictionary keys #72

geokala · 2016-06-11T17:48:51Z

No description provided.

jayvdb · 2016-06-12T13:52:53Z

pyflakes/checker.py

+            return tuple(self.convert_to_value(i) for i in item.elts)
+        elif isinstance(item, ast.Num):
+            return item.n
+        elif isinstance(item, ast.Name):


Im pretty sure this fails with instance properties and other attributes of a name.
Could you add it to the test suite , and discard names if I am correct about them.

Just to make sure I am understanding correctly:
e.g.

class Test(object): pass a = Test() b = Test() a.something = 'yes' { a.something: 1, a.something: 1, }

If your assumption is correct, are you expecting this to fail (not detect a duplicate) or fail (traceback)?
I'll give it a test assuming that you meant the above for the moment (good test to have anyway), and if I misunderstood just let me know and I'll deal with it either later today, or tomorrow evening.

Apologies if I'm misunderstanding something trivial- this it the first time I've done much with the AST.

I expect it to find duplicate 'a' even if different attributes of a are used

It didn't seem to find either, as it was an ast.Attribute object. Those are handled now, however.

(note as this still appears in its own thread on the PR: These are no longer handled- see remaining comments)

(thx, I wish there was a way to force a pr thread to be marked as done. I find github to be very confusing for in-depth code reviewing with multiple incremental revisions being made. The result seems to be that committers tend to merge early and do fixups after, creating churn and often bugs.)

geokala · 2016-06-12T17:34:20Z

I just added some extra logic for dealing with instance attributes, then realised that I'm also missing logic for dictionary keys (e.g. {aaa['something']: 1, aaa['something']: 1}), and missing tests for the new API elements (e.g. the VariableKey class and the convert_to_values method).

I'll add these over the next couple of days.

asmeurer · 2016-06-12T19:52:26Z

Attribute and getitem lookups and probably venturing too far into the false positive territory. It's possible for a.b and a[b] to return different things on each invocation.

geokala · 2016-06-13T10:12:03Z

True, but is that likely to be deliberately done in more than a tiny fraction of cases?

The only case I can see them returning different things on each invocation are where the attribute is specified as a getter and is set to return something different each time (and similarly for __getitem__).
Am I being blind in considering this extremely rare (and probably a code smell?
(bearing in mind that this shouldn't catch calls to methods such as a.something() being duplicated)

jayvdb · 2016-06-13T11:11:55Z

A good sanity check is running pyflakes, with your new error, over cpython, where it should only occur in tests, and probably not even there.

asmeurer · 2016-06-13T18:10:47Z

Non-pure @Property attributes aren't as rare as you would hope.

geokala · 2016-06-14T18:51:09Z

Okay, I'll remove that logic from it shortly then, and get the PR updated. On the bright side, that simplifies the code!

geokala · 2016-06-14T18:58:55Z

Updated.

Based on the existing tests, not testing VariableKey and UnhandledKey would be consistent (though perhaps I should add at least a couple of tests for VariableKey anyway- or should it be trusted that test_other will catch them?).

There also aren't currently tests for convert_to_value. I'm not sure if this is something that there should be tests for or if once again it should be expected that test_other will catch them?

jayvdb · 2016-06-15T02:22:24Z

Your current approach for testing is fine; the utility funcs and classes are getting exercised.

Could you add a test that doesnt raise any errors, with a few valid scenarios.
Also funcs and classes and lambda can be keys.
I dont see a test covering tuple as a key.
Also worth adding a test that 1 and 1.0 are not dup keys, which looks good in your patch but could easily be broken in the future.

Probably also nice to special case ... as a key that can appear multiple times.

And is there any reason to not extend this test to cover duplicate items in sets?

sigmavirus24 · 2016-06-15T12:10:03Z

And is there any reason to not extend this test to cover duplicate items in sets?

You mean something like:

s = set([
   'foo',
   'bar',
   'foo',
])

I think that's less warning worthy than

d = {
    'foo': 1,
    'bar': 2,
    'foo': 3,
}

Given sets have defined behaviour and there is undefined behaviour in the dictionary case.

geokala · 2016-06-15T12:19:25Z

Agreed- if I initialise a set with duplicate values I expect it to discard the duplicates. If I do it with a dictionary then I get unexpected effects (until I track down the issue by hand).

geokala · 2016-06-15T12:20:27Z

@jayvdb: I'll probably address your other comments this evening, should have an update in then.

sigmavirus24 · 2016-06-15T12:21:26Z

Probably also nice to special case ... as a key that can appear multiple times.

Wait, why? That doesn't work as ... is a singleton:

>>> {...: 'foo'}
{Ellipsis: 'foo'}
>>> {...: 'foo', ...: 'bar'}
{Ellipsis: 'bar'}

geokala · 2016-06-15T12:49:33Z

Out of interest, what is the utility of allowing ... to appear as a dictionary key multiple times?

I can't think of a situation in which it'd be useful off the top of my head.

Happy to special-case it, just a bit perplexed as to why anyone would do it in the first place.

sigmavirus24 · 2016-06-15T14:21:57Z

Out of interest, what is the utility of allowing ... to appear as a dictionary key multiple times?

I agree. I'm not sure I see the value. I'm open to an explanation for why it should be special-cased though.

geokala · 2016-06-15T20:02:14Z

Could you add a test that doesnt raise any errors, with a few valid scenarios.

I've added a few tests that don't raise errors, and a test to see if we pick up duplicate keys in a literal dictionary inside a lambda.

Also funcs and classes and lambda can be keys.

Functions and classes are ast.Names.
Lambdas, unless assigned to a variable (which are already caught when duplicated) don't 'count' as duplicates- see:
{ lambda x: None: 1, lambda x: None: 1, }
This actually gives a dictionary with two items.

I dont see a test covering tuple as a key.

https://github.com/pyflakes/pyflakes/pull/72/files#diff-d18f55ece21d21fb172322f7db700d6aR31

Also worth adding a test that 1 and 1.0 are not dup keys, which looks good in your patch but could easily be broken in the future.

Those are actually duplicates (tested in python2 and python2 with {1.0: 1, 1:1} ) and are picked up correctly.

jayvdb · 2016-06-16T03:27:27Z

wrt to ..., it is a legal expression in py3. Its use as a placeholder for code is accepted.
https://mail.python.org/pipermail/python-3000/2008-January/011793.html
Using ... is clear intent by the author (to do the wrong thing) , but it is valid syntax and Python doesnt have any attached meaning for it, so IMO pyflakes shouldnt declare invalid syntax if it is used as a placeholder twice in the same dict:

a = {
    1: 1,
    2: 2,
    3: 3,
    ...: ...,
    'a': 'a',
    'b': 'b',
    'c': 'c',
    ...: ...,
}

There is only a very small chance we're helping anyone with that being an error. If someone is using ... in a dict, they shouldnt be expecting Python syntax checkers to be assisting them. It cant be a typo as far as I can see, as there is no valid syntax which is lexically similar to .... They intended to use it, their intent indicates the code is not production code, it is valid to use it, so why error?

If pyflakes is going to error when ... is used in that dict, we should error whenever it appears in the dict (not just the first time), or possibly anywhere in code except in slices. But that is assuming a meaning for ..., and shouldnt be done by pyflakes IMO (but would be very useful , similar to flake8_print and similar which check that code is ready for promoting).

This is in contrast to the other similar constant NotImplemented, which has no placeholder value and it is reasonable to have it appear in a dict of real production code, and only one dict entry would be valid.

geokala · 2016-07-16T16:57:05Z

@jayvdb: Sorry- I must've misunderstood. Previous behaviour (up until the change I just pushed) was to raise only one complaint per set of duplicated keys with differing values (e.g. { 1: 1, 1: 2 } would raise one complaint). I've modified it now so it should behave as you expect.

@lamby: I've also put in the change you and @sigmavirus24 suggested above (well, you suggested, with addendum, etc, etc). It could actually be changed to == 1 (it can't ever be zero for the comparison it's doing), but I'm not sure if that gives any real improvement.

lamby · 2016-07-16T17:00:57Z

pyflakes/checker.py

+                    differing_values = True
+                    break
+
+            if differing_values:


another one for inverting the logic and continue?

The whole differing_values section can be reduced to a single if [any|all](): continue , which reduces the complexity of the method by ~16%

lamby · 2016-07-16T17:01:38Z

but i'm not sure if that gives any real improvement.

Really? Reducing indentation makes it far, far easier to follow IMHO

jayvdb · 2016-07-16T20:06:49Z

@geokala , thanks for including reporting of all the lines with duplicates, and adding the correct line numbers.

Could you please use %r for the literals, which will add quotes around the strings, which is quite necessary if the string contains a space, or worse if the string contains newline characters.

I'd also really like the reported literal to be same as appears in the source, rather than key 0x0020 appearing in the report output as 32. If I see a literal in a error report, I expect to be able to grep the source file for that literal. The perfect UI IMO would emit ... "0x0020" (32) ..., as then it is grepable, and two different ways of writing the same number are shown as the same number.
To do this, it looks like we need to use dict_node.keys[i].col_offset, and parse for the end of the key (and give up if we somehow reach the corresponding dict_node.values[i].col_offset ?). Or is there another way I'm not seeing? Maybe this should be a followup patch.

re complexity/indents/etc:

$ python -m mccabe pyflakes/checker.py | grep DICT
("925:1: 'Checker.DICT'", 12)

That makes it the fourth most complex method, and the second most complex node handler method. Which is not representative of the actual complexity of the task. I've noted above one way to reduce that to 10, and I suspect one or two more generators/comprehensions could be added to make the complex parts can be easy to load into the brain as a single chunk.

geokala · 2016-07-18T20:17:47Z

@lamby: Apologies, I was unclear in my meaning- I meant that changing it to == 1, rather than <=1, while filling the same function here (and functioning the same way currently), probably doesn't gain us anything (and may open up future bug possibilities if the logic is slightly changed). The original change, yes, no complaints. -smiles-

@jayvdb: Sorry, I must've missed that one- I recall intending to do it. Either way, it should be good now.
I think the reported literal issue for hex (and similar) is going to be worthy of a followup patch, yes. As you suggest, it is somewhat non-trivial (unless someone knows a way).
I'll do another pass now to try to get the complexity below 10.

geokala · 2016-07-18T20:32:01Z

@jayvdb: Change made per your suggestion of any(x) for differing_values.
Complexity now 10.

I'm not sure if it can be reasonably reduced in complexity now (in a way that doesn't sacrifice readability), but that may be down to my skill level and/or perceptions.

jayvdb · 2016-07-19T02:30:26Z

Take a look at jayvdb@3b29b50 for some ideas on how to cut complexity to 5.

sigmavirus24 · 2016-07-19T13:01:15Z

@jayvdb I don't know of a specific guideline for what the complexity of the code should be. If there is one, we should add it to Travis CI and document it for future contributors. At this point, playing complexity golf with @geokala seems rude. If you have the complexity that low in your own fork, feel free to submit it as a pull request after this one. I think @geokala has been more than patient with us here.

sigmavirus24 · 2016-07-19T13:16:55Z

Since we're just golfing over complexity, I'm going to run this over some more code-bases today and if all seems good, I'm going to merge this.

If there are reviews other than complexity golfing, please do leave them. I think this pull request has lived for quite long enough. Other style nits should be submitted by people after merge.

geokala · 2016-07-19T14:10:43Z

@sigmavirus24 Sounds fair. I think @jayvdb's changes were quite elegant (I should've guessed collections would have something to simplify what I was trying to do there), but if you're happy running over this and then determining complexity intent later that's cool.

Also, yes, I do think determining a complexity guideline and testing for it (though maybe with an easy and obvious way to add exceptions) might be a good idea. I'm fairly sure flake8 supports this natively (as I think I actually turned it on at the start of a project in a previous workplace).

Either way, if there are any other issues, I'm happy to address them!

jayvdb · 2016-07-19T14:41:23Z

My review comments are not just style nits. They are also performance improvements.

sigmavirus24 · 2016-07-19T14:44:10Z

@jayvdb there is one review comment here unrelated to complexity that @geokala has kindly addressed. Unless there are more comments that are not being displayed by GitHub, all your existing comments relate to complexity.

sigmavirus24 · 2016-07-19T14:49:59Z

Also @geokala if you want to adopt @jayvdb's changes, go ahead. IMO, you don't have to keep playing complexity golf though. You were asked to make it no higher than 10 and did so.

geokala · 2016-07-19T15:01:12Z

I'm happy to adopt them, though it'll probably be late tonight before I get to them.

jayvdb · 2016-07-19T16:00:45Z

Ian, I think you have misread my comment earlier about '10' and if you look briefly at my diff you will see it is performance improvements implementing the additional generators/comprehensions I had indicated earlier would help get the complexity below 10, down to a reasonable level. The current version is twice reimplementing a Counter, using list scans. It is best practise when doing code reviews to not worry about those kind of review comments until the patch is functionally ok. I put a diff up because that was easier than trying to explain how to fix it, which would have been a longer and more aggravating process for everyone involved, as it involved rearranging the code significantly. But the code isnt mine, and I wouldnt want to submit a rewrite after this and have the git blame attribute me. If geokala can put a little more polish on their patch, the merged code should be stable until we work on the outstanding problem of printing the same representation as was used in the source.

jayvdb · 2016-07-20T10:48:41Z

While looking at pyflakes output, it occurred to me that this feature is basically identical to the existing RedefinedWhileUnused which emits "filename.py:875: redefinition of unused 'xxx' from line 863 , except it is a prior dict key value that is unused rather than a variable value.

If there are any complaints after this is release that this is another case of pyflakes violating its contract with users and producing an error on valid code, RedefinedWhileUnused is a simple way to demonstrate that there is precedent for it.

Also, if we wanted to only emit only one error line for a pair of repeated keys, using the same suffix of RedefinedWhileUnused being ... from line x referring at the prior & different use of the key would be the way to do it. Note: I'm not asking for, or advocating for, it to be modified; im just saying that would be reasonable IMO.
The current approach of emitting two messages for a pair of repeated keys is a bit verbose, but it is my preference as it works better with the flake8 --show-source as both lines with the different values are shown.

geokala · 2016-07-20T19:42:53Z

Hmm... collections has no Counter in 2.6... @jayvdb, do we have a good alternative other than using a backport?

jayvdb · 2016-07-21T01:05:49Z

Sure. We only need a dict with the value incrementing, as we dont need any of the fancy Counter features.
Best to build it as a fallback class for when the real Counter doesnt exist, but it could also be a normal (default)dict incrementing values using a for loop.

geokala · 2016-07-24T14:15:39Z

@jayvdb: Okay, I've put in a counter method and just used it, rather than putting extra complexity in to determine whether to use it.

@sigmavirus24 (copying you on this if you want to give it another look and/or if you're both happy to merge?)

jayvdb · 2016-07-24T23:33:39Z

Nice work. Thank you so much for sticking with it.

geokala · 2016-07-25T05:26:27Z

No worries- thanks to all of you for your comments!

geokala force-pushed the dict-duplicate-keys-check branch 9 times, most recently from 223f86e to 0c2a6e6 Compare June 11, 2016 18:56

jayvdb reviewed Jun 12, 2016
View reviewed changes

geokala force-pushed the dict-duplicate-keys-check branch from 0c2a6e6 to c01aecd Compare June 12, 2016 17:31

geokala force-pushed the dict-duplicate-keys-check branch from c01aecd to f164297 Compare June 14, 2016 18:56

geokala force-pushed the dict-duplicate-keys-check branch from f164297 to 87429d5 Compare June 15, 2016 20:01

geokala force-pushed the dict-duplicate-keys-check branch from 87429d5 to a93d3ce Compare June 15, 2016 20:06

lamby reviewed Jul 16, 2016
View reviewed changes

geokala force-pushed the dict-duplicate-keys-check branch from 29263cb to 401b012 Compare July 18, 2016 20:16

geokala force-pushed the dict-duplicate-keys-check branch from 401b012 to 31c1bcf Compare July 18, 2016 20:27

geokala force-pushed the dict-duplicate-keys-check branch from 31c1bcf to aa39997 Compare July 20, 2016 19:38

Check for duplicate dictionary keys

fcf4c6b

geokala force-pushed the dict-duplicate-keys-check branch from aa39997 to fcf4c6b Compare July 24, 2016 14:08

jayvdb merged commit 152ca18 into PyCQA:master Jul 24, 2016

pyflakes-bot mentioned this pull request May 28, 2017

Wishlist: report duplicate keys in dict literals #230

Closed

jayvdb mentioned this pull request Oct 2, 2017

Detect duplicate keys in dictionary literals #241

Closed

yangdanny97 mentioned this pull request Oct 30, 2024

Add B041: Duplicate key-value pairs in dictionary literals PyCQA/flake8-bugbear#496

Merged

Check for duplicate dictionary keys #72

Check for duplicate dictionary keys #72

Conversation

geokala commented Jun 11, 2016

jayvdb Jun 12, 2016

Choose a reason for hiding this comment

geokala Jun 12, 2016 • edited Loading

Choose a reason for hiding this comment

jayvdb Jun 12, 2016

Choose a reason for hiding this comment

geokala Jun 12, 2016

Choose a reason for hiding this comment

geokala Jun 19, 2016

Choose a reason for hiding this comment

jayvdb Jun 19, 2016

Choose a reason for hiding this comment

geokala commented Jun 12, 2016

asmeurer commented Jun 12, 2016

geokala commented Jun 13, 2016 • edited Loading

jayvdb commented Jun 13, 2016

asmeurer commented Jun 13, 2016

geokala commented Jun 14, 2016

geokala commented Jun 14, 2016

jayvdb commented Jun 15, 2016

sigmavirus24 commented Jun 15, 2016

geokala commented Jun 15, 2016

geokala commented Jun 15, 2016

sigmavirus24 commented Jun 15, 2016

geokala commented Jun 15, 2016

sigmavirus24 commented Jun 15, 2016

geokala commented Jun 15, 2016 • edited Loading

jayvdb commented Jun 16, 2016

geokala commented Jul 16, 2016

lamby Jul 16, 2016

Choose a reason for hiding this comment

jayvdb Jul 16, 2016

Choose a reason for hiding this comment

lamby commented Jul 16, 2016

jayvdb commented Jul 16, 2016

geokala commented Jul 18, 2016

geokala commented Jul 18, 2016

jayvdb commented Jul 19, 2016

sigmavirus24 commented Jul 19, 2016

sigmavirus24 commented Jul 19, 2016

geokala commented Jul 19, 2016

jayvdb commented Jul 19, 2016

sigmavirus24 commented Jul 19, 2016

sigmavirus24 commented Jul 19, 2016

geokala commented Jul 19, 2016

jayvdb commented Jul 19, 2016 • edited Loading

jayvdb commented Jul 20, 2016

geokala commented Jul 20, 2016

jayvdb commented Jul 21, 2016

geokala commented Jul 24, 2016

jayvdb commented Jul 24, 2016

geokala commented Jul 25, 2016

geokala Jun 12, 2016 •

edited

Loading

geokala commented Jun 13, 2016 •

edited

Loading

geokala commented Jun 15, 2016 •

edited

Loading

jayvdb commented Jul 19, 2016 •

edited

Loading