-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix certain hash miss matches in case of non weel installations #7594
Fix certain hash miss matches in case of non weel installations #7594
Conversation
It will have a significant impact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about the issue and trying a few things, I suspect the solution is much simpler: Don't move the _populate_hashes_dict()
, but make it conditional:
if archive.suffix != ".whl":
...
elif link.is_wheel:
self._populate_hashes_dict(archive, package)
We only want to populate the dict with (and check the hash of) the wheel, if it was downloaded from somewhere and not built from an sdist by poetry.
By the way, that's not true. Example: Cache after installing tomli from wheel:
Cache after installing tomli from sdist afterwards:
Artifact cache folders are per link. A downloaded sdist and a downloaded wheel have different links and thus are stored in different folders. The self-built wheel is put in the same folder as the sdist. |
Wow my bad. I checked the file hashes for the filename but i missed comparing the path that it was actually the same file that i was looking at. I even saw the url logic for the cache path generation 😅. So much for jumping to conclusions. This makes your solution work and we get the caching for generated wheels back 🥳. Though i am not sure if its a good idea to skip validation for everything besides wheels. Should poetry not validate the downloaded and (cached) tar.gz file if there is a hash from the repo available ? Or is that done somewhere else that i did not see. I provided a solution which still includes validation of non_wheel archives if possible. But then i am not sure what hash should be recorded in this case (the original e.g. tar.gz hash, the generated whl hash or none) ? |
Good point. I've noticed this too, but would have put it off until later if you hadn't brought it up. I think you are on the right track so we can also tackle it now. 😄 It does not make sense to record the hash of the generated wheel because this hash is not known by anybody else and the wheel is only intermediate. We always have to record the original url and hash (tar.gz in this case). If I see it correctly, there is still one flaw: The hash of the tar.gz is only recorded if there is not yet a cached wheel. In order to fix this, I assume we need two versions of Then, we can use poetry/src/poetry/installation/executor.py Line 696 in f27308f
afterwards always call |
Ok that looks like a good idea. I dont have the time today but will have a look tomorrow to get a implementation setup up and improve the tests to cover it. |
993361d
to
21d6675
Compare
…herwise we would overwrite an already existing cached wheel. This could then make the installation fail due to missmatched hashes. Changed Executor._download_link() to validate package hashes via _populate_hashes_dict before building a new wheel incase we use a non wheel archive.
a547904
to
87181dc
Compare
@radoering I did find some time already and the implementation including tests is complete as far as i can see. Let me know if there is still something missing and how to proceed from here :). |
Co-authored-by: Randy Döring <30527984+radoering@users.noreply.github.com>
* fix hash mismatch during installation with `no-binary` option for packages that provide suitable wheels * do not skip hash check for sdists * write hash for sdists in direct_url.json (cherry picked from commit 26fbfd6)
* fix hash mismatch during installation with `no-binary` option for packages that provide suitable wheels * do not skip hash check for sdists * write hash for sdists in direct_url.json (cherry picked from commit 26fbfd6)
* fix hash mismatch during installation with `no-binary` option for packages that provide suitable wheels * do not skip hash check for sdists * write hash for sdists in direct_url.json (cherry picked from commit 26fbfd6)
Hello, would anything related to this fix cause this side effect? _WheelFileValidationError This is confirmed to happen with 1.4.1 but not with 1.4.0. Thanks! |
@quantum-byte I am pretty sure this causes a bug in a real-world scenario where the cache is empty (a.k.a no tar.gz and no .whl cached).
I will continue the discussion in the issue I opened, would love your help on solving it there 🙏 Issue: #8320 |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Addressed Issue:
tar.gz
)This is due to the original wheel archive from the registry being overwritten in the poetry artifact cache by the newly created wheel archive from the sdist(Overwrite does not happen but the wrong hash is checked. See comment )tar.gz
archive. The new wheel archive though has minimal changes (bdist version etc.) and causes the sha265 hash miss match.Changes to fix the issue:
Adjusted Chef.prepare() to always write to a temporary directory. Otherwise we would overwrite an already existing cached wheel. This could then make the installation fail due to miss matched hashes.Changed Executor._download_link() to validate package hashes via _populate_hashes_dict before building a new wheel in case we use a non wheel archive.This eliminates the caching of the built wheel for non wheel archives but i did not see an easy way to keep this caching and fix the issue. Do you think this will impact performance in a meaningfull way ?Pull Request Check List
Resolves: Unclear - I have not created a specific issue for my problem but it could address some issues related to hash miss matches.