-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling RECORD files with duplicate entries #5913
Comments
IMO, PEP 376 is clear: "The So based on my reading of the PEPs, duplicate filenames in I'm OK with pip reporting an error if there are duplicates. But I won't object if someone wants to argue for continuing with some sort of warning. Any argument for continuing should probably consider the security implications, though (the user might not end up with what they expect). If we dynamically generate a file that clashes with a file that's supplied in the wheel, my view is that we should do the same (I tend towards reporting an error, but I'm OK with someone arguing for overwrite-and-continue). Regarding "Should RECORD contain anything outside of the to-be-installed site-packages" I'm not sure what you're thinking of here. Can you clarify? One possible sticking point is case insensitive vs case sensitive filesystems. If a wheel contains |
Regarding the “outside of the to-be-installed site-packages” part—a package can contain a RECORD rule like But on second thought, this is probably needed since a Python package really can write to anywhere. I’ll remove that part from the list. |
In terms of making forward progress, it might be simplest to decide first how to handle paths that are equal as strings, since that would be a lot easier. Then how to handle paths that match case-insensitively could be decided on later. The case insensitivity issue seems a lot trickier to me and less clear. |
Agreed. Sorry if I wasn't clear - case insensitivity is definitely a "consider it later" problem :-) |
So the next thing to consider is, what should pip do? I agree it makes sense to generate an error. It could make sense to use a warning instead (especially if the duplicate rows have matching hashes), but it’s easier to err on the restrictive side, and relax the rules if necessary. I am not very familiar with this section of pip, but it doesn’t seem to perform any error handling right now. Would it be enough to simply raise an exception when this happens, and improve from there? |
I don't know much about this area either. Is pip only reading RECORD files, or also writing new ones? In the case of reading, without knowing much, it doesn't seem like pip should error out if the hashes (and sizes) are the same (or compatible, like if one is empty). However, if the hashes and/or sizes are conflicting, then it seems like erroring out would be a good thing (because pip has no way of knowing which row to choose). |
AIUI, the only case that's actually been seen of this happening is a bad wheel file generated by poetry, which has since been superseded (and the poetry author agreed that the wheel was wrong). So an exception/traceback sounds sufficient at least for now. Let's not spend too much time on a mostly-theoretical issue. Regarding where pip handles RECORD files, I'm not that familiar with this area either, but I think we do the following:
I'd stick to fixing the case that the bad poetry wheel triggered, and worry about anything else when it happens. If we're saying that duplicates are invalid according to the PEPs, then it's not like there's an urgent need to tidy up any potential failures where that assumption is violated. |
More specifically, pip never writes RECORD files in wheels - that's the build backend's job. The only place where pip would ever write a record file is when installing, to the PEP 376 "database of installed packages" (dist-info directory in site-packages). |
How does pip use the information in the RECORD file when reading it? Like, does it do anything with the hashes and sizes, etc, and what does it do with the paths that are listed (as opposed to paths not listed)? |
pip/src/pip/_internal/wheel.py Lines 516 to 537 in 116c3b1
If my understanding of the code is correct, pip does not consult the wheel’s RECORD files, but do it the other way around: it copies all files in the wheel, and add/fix lines into the installed RECORD if they are missing/different from the wheel’s. |
So how were we getting an error from the bad poetry wheel? I'm confused now... |
The bad Poetry wheel’s RECORD contains a row duplicating one of
|
Also the error it causes is during writing, not reading. pip sorts rows before it writes them (line 535); duplicate rows cause the previous sorting function to fail. |
In terms of this issue, I'm guessing we all agree that pip should always only write valid RECORD files (and to error out if it can't). However, when reading an existing RECORD file in order to write one, it seems like there are two types of errors that pip might encounter:
For (2), I feel like pip should error out. But how do people feel about (1)? Also, was the Poetry error of type (1) or (2)? |
I think it can be categorised as (1). Since pip favours the generated script over the one included in wheel, it can ignore the row from the wheel’s RECORD. |
I concur with @uranusjr's suggestion. |
From #5868 (comment) and #5883 (comment)
The problem: RECORD may contain rows poiting to the same path, potentially with different sizes and hashes.
Points to discuss (may be expanded or modified):
console_scripts
) be treated differently if they duplicate existing rows?The text was updated successfully, but these errors were encountered: