-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash check when installing packages from cache #3301
Conversation
66c416b
to
66c2267
Compare
d1ddcc6
to
c23be60
Compare
poetry/installation/chef.py
Outdated
cache_dir = self.get_cache_directory_for_link(link) | ||
|
||
archive_types = ["whl", "tar.gz", "tar.bz2", "bz2", "zip"] | ||
links = [] | ||
for archive_type in archive_types: | ||
for archive in cache_dir.glob("*.{}".format(archive_type)): | ||
links.append(Link(archive.as_uri())) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a test to https://github.com/python-poetry/poetry/blob/master/tests/installation/test_chef.py is needed to cement the contract that a bad link hash is not accepted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
fixes: #3326 |
I think it would be better to have Poetry remove the broken file from the cache in place of the user. The cache is the software's responsiblity, not the user's. |
1ee3f31
to
3e439a4
Compare
In this request, Poetry itself removes the broken files. When loading the library, Poettry warns about the corruption of the file in the cache, and then overwrites the file itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs documentation change i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some documentation should be added, but otherwise, looks good.
Is this something I should do? |
Hi. When it comes to documentation, it's expected to add necessary changes to docs if needed. Also, please merge with master branch and resolve conflicts. |
src/poetry/installation/chef.py
Outdated
if link.hash: | ||
real_hash = get_file_hash(archive, link.hash_name) | ||
if real_hash != link.hash: | ||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loggers are usually declared on top of files as globals. Also, you need to add the logger to loggers
in src.poetry.console.application
in line 234.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/poetry/installation/chef.py
Outdated
real_hash = get_file_hash(archive, link.hash_name) | ||
if real_hash != link.hash: | ||
logger = logging.getLogger(__name__) | ||
logger.warning(f"cache of {link.filename} is corrupted") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.warning(f"cache of {link.filename} is corrupted") | |
logger.warning(f"Cache of {link.filename} is corrupted. It won't be used.") |
Leaving only a warning that the cache is corrupted, might confuse user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "Cache of {} is corrupted. The file will be reloaded."
src/poetry/utils/helpers.py
Outdated
with open(filepath, "rb") as f: | ||
|
||
res_hash = hashlib.new(hash_name) | ||
|
||
while True: | ||
buffer = f.read(block_size) | ||
if not buffer: | ||
break | ||
|
||
res_hash.update(buffer) | ||
|
||
return res_hash.hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need for that many empty lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
tests/utils/test_helpers.py
Outdated
def test_get_file_hash(): | ||
|
||
with tempfile.TemporaryDirectory() as tmp_dir: | ||
|
||
test_data = [b"", b"12345", b"123" * 8000] | ||
|
||
for i, data in enumerate(test_data): | ||
|
||
tmp_file = Path(tmp_dir) / f"file{i}" | ||
tmp_file.write_bytes(data) | ||
|
||
real_hash = hashlib.md5(data).hexdigest() | ||
assert get_file_hash(tmp_file, "md5") == real_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above, I don't think you need that many empty lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -42,10 +46,6 @@ def is_wheel(self, archive: Path) -> bool: | |||
return archive.suffix == ".whl" | |||
|
|||
def get_cached_archive_for_link(self, link: Link) -> Link: | |||
# If the archive is already a wheel, there is no need to cache it. | |||
if link.is_wheel: | |||
return link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also whl was downloaded every poetry install
, but at the same time saved to the cache folders.
So the cache exists but was useless.
So I removed this lines.
3ca2284
to
54bfc99
Compare
@Secrus should i do something else? |
@0xDEC0DE and @sdispater worked on a similar issue, maybe they can approve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. One small suggestion, inline.
f"Cache of {link.filename} is corrupted. The file will be" | ||
" reloaded." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps including the expected+actual hashes in the error messages might help people spot weird problems?
e.g., if a bunch of cached files were truncated to 0 bytes, they'd all log with the same hash, and users would know something wacky had happened.
Or it might be pointless noise. Who knows?
Cherry picked from PR: python-poetry#3301 Without these changes, Poetry fails if a file in the cache is corrupt and it needs to be cleaned up manually
I think this PR is not relevant anymore (and must be closed), since, as far as I understand, current Poetry version already check hashes for every file, cached or not. But, perhaps the author can salvage something. Two things come to my mind: (1) I didn't find on current code a dedicated test to verify cached hash mismatch, and (2) maybe it can be introduced the nice log messages when there is a cache mismatch. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
When installing packages, poetry uses cache to avoid redownloading files.
If cached package file is corrupted (due to failed download, etc), poetry will not work until you delete this file from cache directory manually.
This PR adds hash check when installing packages from cache: if hash is incorrect, the file will be redownloaded.
Pull Request Check List