-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for duplicate dictionary keys #72
Conversation
223f86e
to
0c2a6e6
Compare
return tuple(self.convert_to_value(i) for i in item.elts) | ||
elif isinstance(item, ast.Num): | ||
return item.n | ||
elif isinstance(item, ast.Name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im pretty sure this fails with instance properties and other attributes of a name.
Could you add it to the test suite , and discard names if I am correct about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I am understanding correctly:
e.g.
class Test(object):
pass
a = Test()
b = Test()
a.something = 'yes'
{
a.something: 1,
a.something: 1,
}
If your assumption is correct, are you expecting this to fail (not detect a duplicate) or fail (traceback)?
I'll give it a test assuming that you meant the above for the moment (good test to have anyway), and if I misunderstood just let me know and I'll deal with it either later today, or tomorrow evening.
Apologies if I'm misunderstanding something trivial- this it the first time I've done much with the AST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect it to find duplicate 'a' even if different attributes of a are used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't seem to find either, as it was an ast.Attribute object. Those are handled now, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note as this still appears in its own thread on the PR: These are no longer handled- see remaining comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(thx, I wish there was a way to force a pr thread to be marked as done. I find github to be very confusing for in-depth code reviewing with multiple incremental revisions being made. The result seems to be that committers tend to merge early and do fixups after, creating churn and often bugs.)
0c2a6e6
to
c01aecd
Compare
I just added some extra logic for dealing with instance attributes, then realised that I'm also missing logic for dictionary keys (e.g. {aaa['something']: 1, aaa['something']: 1}), and missing tests for the new API elements (e.g. the VariableKey class and the convert_to_values method). I'll add these over the next couple of days. |
Attribute and getitem lookups and probably venturing too far into the false positive territory. It's possible for a.b and a[b] to return different things on each invocation. |
True, but is that likely to be deliberately done in more than a tiny fraction of cases? The only case I can see them returning different things on each invocation are where the attribute is specified as a getter and is set to return something different each time (and similarly for |
A good sanity check is running pyflakes, with your new error, over cpython, where it should only occur in tests, and probably not even there. |
Non-pure @Property attributes aren't as rare as you would hope. |
Okay, I'll remove that logic from it shortly then, and get the PR updated. On the bright side, that simplifies the code! |
c01aecd
to
f164297
Compare
Updated. Based on the existing tests, not testing VariableKey and UnhandledKey would be consistent (though perhaps I should add at least a couple of tests for VariableKey anyway- or should it be trusted that test_other will catch them?). There also aren't currently tests for convert_to_value. I'm not sure if this is something that there should be tests for or if once again it should be expected that test_other will catch them? |
Your current approach for testing is fine; the utility funcs and classes are getting exercised. Could you add a test that doesnt raise any errors, with a few valid scenarios. Probably also nice to special case And is there any reason to not extend this test to cover duplicate items in sets? |
You mean something like: s = set([
'foo',
'bar',
'foo',
]) I think that's less warning worthy than d = {
'foo': 1,
'bar': 2,
'foo': 3,
} Given sets have defined behaviour and there is undefined behaviour in the dictionary case. |
Agreed- if I initialise a set with duplicate values I expect it to discard the duplicates. If I do it with a dictionary then I get unexpected effects (until I track down the issue by hand). |
@jayvdb: I'll probably address your other comments this evening, should have an update in then. |
Wait, why? That doesn't work as >>> {...: 'foo'}
{Ellipsis: 'foo'}
>>> {...: 'foo', ...: 'bar'}
{Ellipsis: 'bar'} |
Out of interest, what is the utility of allowing ... to appear as a dictionary key multiple times? I can't think of a situation in which it'd be useful off the top of my head. Happy to special-case it, just a bit perplexed as to why anyone would do it in the first place. |
I agree. I'm not sure I see the value. I'm open to an explanation for why it should be special-cased though. |
f164297
to
87429d5
Compare
I've added a few tests that don't raise errors, and a test to see if we pick up duplicate keys in a literal dictionary inside a lambda.
Functions and classes are ast.Names.
Those are actually duplicates (tested in python2 and python2 with {1.0: 1, 1:1} ) and are picked up correctly. |
87429d5
to
a93d3ce
Compare
wrt to
There is only a very small chance we're helping anyone with that being an error. If someone is using If pyflakes is going to error when This is in contrast to the other similar constant |
@jayvdb: Sorry- I must've misunderstood. Previous behaviour (up until the change I just pushed) was to raise only one complaint per set of duplicated keys with differing values (e.g. { 1: 1, 1: 2 } would raise one complaint). I've modified it now so it should behave as you expect. @lamby: I've also put in the change you and @sigmavirus24 suggested above (well, you suggested, with addendum, etc, etc). It could actually be changed to == 1 (it can't ever be zero for the comparison it's doing), but I'm not sure if that gives any real improvement. |
differing_values = True | ||
break | ||
|
||
if differing_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another one for inverting the logic and continue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole differing_values section can be reduced to a single if [any|all](): continue
, which reduces the complexity of the method by ~16%
Really? Reducing indentation makes it far, far easier to follow IMHO |
@geokala , thanks for including reporting of all the lines with duplicates, and adding the correct line numbers. Could you please use I'd also really like the reported literal to be same as appears in the source, rather than key re complexity/indents/etc:
That makes it the fourth most complex method, and the second most complex node handler method. Which is not representative of the actual complexity of the task. I've noted above one way to reduce that to 10, and I suspect one or two more generators/comprehensions could be added to make the complex parts can be easy to load into the brain as a single chunk. |
29263cb
to
401b012
Compare
@lamby: Apologies, I was unclear in my meaning- I meant that changing it to == 1, rather than <=1, while filling the same function here (and functioning the same way currently), probably doesn't gain us anything (and may open up future bug possibilities if the logic is slightly changed). The original change, yes, no complaints. -smiles- @jayvdb: Sorry, I must've missed that one- I recall intending to do it. Either way, it should be good now. |
401b012
to
31c1bcf
Compare
@jayvdb: Change made per your suggestion of any(x) for differing_values. I'm not sure if it can be reasonably reduced in complexity now (in a way that doesn't sacrifice readability), but that may be down to my skill level and/or perceptions. |
Take a look at jayvdb@3b29b50 for some ideas on how to cut complexity to 5. |
@jayvdb I don't know of a specific guideline for what the complexity of the code should be. If there is one, we should add it to Travis CI and document it for future contributors. At this point, playing complexity golf with @geokala seems rude. If you have the complexity that low in your own fork, feel free to submit it as a pull request after this one. I think @geokala has been more than patient with us here. |
Since we're just golfing over complexity, I'm going to run this over some more code-bases today and if all seems good, I'm going to merge this. If there are reviews other than complexity golfing, please do leave them. I think this pull request has lived for quite long enough. Other style nits should be submitted by people after merge. |
@sigmavirus24 Sounds fair. I think @jayvdb's changes were quite elegant (I should've guessed collections would have something to simplify what I was trying to do there), but if you're happy running over this and then determining complexity intent later that's cool. Also, yes, I do think determining a complexity guideline and testing for it (though maybe with an easy and obvious way to add exceptions) might be a good idea. I'm fairly sure flake8 supports this natively (as I think I actually turned it on at the start of a project in a previous workplace). Either way, if there are any other issues, I'm happy to address them! |
My review comments are not just style nits. They are also performance improvements. |
I'm happy to adopt them, though it'll probably be late tonight before I get to them. |
Ian, I think you have misread my comment earlier about '10' and if you look briefly at my diff you will see it is performance improvements implementing the additional generators/comprehensions I had indicated earlier would help get the complexity below 10, down to a reasonable level. The current version is twice reimplementing a Counter, using list scans. It is best practise when doing code reviews to not worry about those kind of review comments until the patch is functionally ok. I put a diff up because that was easier than trying to explain how to fix it, which would have been a longer and more aggravating process for everyone involved, as it involved rearranging the code significantly. But the code isnt mine, and I wouldnt want to submit a rewrite after this and have the git blame attribute me. If geokala can put a little more polish on their patch, the merged code should be stable until we work on the outstanding problem of printing the same representation as was used in the source. |
While looking at pyflakes output, it occurred to me that this feature is basically identical to the existing If there are any complaints after this is release that this is another case of Also, if we wanted to only emit only one error line for a pair of repeated keys, using the same suffix of |
31c1bcf
to
aa39997
Compare
Hmm... collections has no Counter in 2.6... @jayvdb, do we have a good alternative other than using a backport? |
Sure. We only need a dict with the value incrementing, as we dont need any of the fancy Counter features. |
aa39997
to
fcf4c6b
Compare
@jayvdb: Okay, I've put in a counter method and just used it, rather than putting extra complexity in to determine whether to use it. @sigmavirus24 (copying you on this if you want to give it another look and/or if you're both happy to merge?) |
Nice work. Thank you so much for sticking with it. |
No worries- thanks to all of you for your comments! |
No description provided.