-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #5868: TypeError in move_wheel_files(). #5883
Conversation
# can be strings in some rows and integers in others. | ||
def sorted_outrows(outrows): | ||
"""Return the given "outrows" in sorted order.""" | ||
return sorted(outrows, key=lambda row: tuple(str(x) for x in row)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to coerce everything to string when outrows
is appended to, instead to needing to deal with mixed types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an interesting discussion at the original issue after I wrote this PR:
#5868
So I think I actually want to "withdraw" this now. :) Or at least rethink it first as I think some decisions need to be made. It might be better to discuss at that issue.
I may close this or mark as "WIP" in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr In thinking more about this, I'm starting to think that what I originally proposed is okay. There are two reasons: (1) Coercing everything to a string on append seems more brittle because you need to add that logic each place you are appending, which can be multiple spots (or remember to use a common helper function when appending). (2) Coercing everything to a string seems to violate the spirit of PEP 376. That PEP says the third element should be a size (i.e. integer). Thus I think it would be better / safer to leave the rows themselves alone, and confine the coercion to the sort operation (which is just a cosmetic thing anyways).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sorting on the file path should be enough ? We should not be getting two lines for the same file.
(And add a warning/error if we end up with duplicate lines)
(Sorry for the multiple/numerous duplicated comments ^^)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @xavfernandez. I definitely support at least adding a warning, but I think that should be done as part of a separate issue and PR so as not to expand the scope. I meant for this PR only to prevent the sort operation from crashing.
Re: sorting by only the first element, it's true that using all elements might almost never matter, but is there any harm? Being able to guarantee determinism even in unlikely edge cases or error cases seems like a good thing. If we add validation later to ensure the file names will be unique, we can always adjust the sort operation then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine I guess, maybe with a comment explaining the expected format (name, hash, size) and the fact that we are ok with sorting integer as string (since normally the sorting only happens on name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation + a comment explaining nuances (maybe with a pointer to this issue and/or #5868) should be enough IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good solution, @xavfernandez and @uranusjr. Thanks. I'll draft up a comment and repost.
I'm closing this since there was more discussion at issue #5868 after I wrote this, and I think some decisions need to be resolved first. This PR can always be reopened. |
@uranusjr Regarding this PR, one thing that's holding me back is whether it's really better not to crash. Do we know if preventing a crash here would cause an even harder-to-diagnose problem later on in the chain, because of duplicate entries in the RECORD file? Your anti-Postel's principle comment would say that it's better to crash, right? |
I think the problem is that the spec doesn’t forbid duplicate entries. If that is to be allowed, pip will need to handle it. I agree potential duplicate entries could lead to harder-to-diagnose problems, but first we’ll need to do amend dist-info and wheel specs to describe whether duplicates may exist, and how they should be treated (if they are allowed). The wheel spec would also need to say whether it can contain certain entries, or specify the installer’s behaviour (ignored/overridden) if those paths exist. |
@uranusjr Good point about the spec. Thanks. Regarding this PR, do you approve of it? Is there any downside of it in your opinion? One possible downside of what you suggested in PR #5890 is that it doesn't provide determinism in as many cases (e.g. in the duplicate entry edge case), which was the reason for sorting in the first place. Since duplicate names is an edge case that could be worth testing, it seems like it would be good to have determinism there as well (and seems it couldn't hurt). |
I feel the implementation is good enough if the intention is purely to make ordering deterministic. Otherwise it could be better to sort the last item as integer… maybe? |
Yes, that's something that occurred to me, too. But then it adds the complication of what to do with the empty string ( |
I added an expanded comment to the patch, as suggested. Let me know if it looks okay. |
src/pip/_internal/wheel.py
Outdated
or the empty string. | ||
""" | ||
# Normally, there should only be one row per path, so the second and | ||
# third elements of each row don't normally come into play when sorting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two “normally” in this sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion of how it should be rephrased? The meaning doesn't seem correct to me if either one is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe “Normally, there should …, and the second and third in this case …”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K
4359d37
to
e3f1fdf
Compare
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This fixes #5868.