Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash check when installing packages from cache #3301

Closed
wants to merge 1 commit into from

Conversation

ombschervister
Copy link

@ombschervister ombschervister commented Oct 28, 2020

When installing packages, poetry uses cache to avoid redownloading files.
If cached package file is corrupted (due to failed download, etc), poetry will not work until you delete this file from cache directory manually.

This PR adds hash check when installing packages from cache: if hash is incorrect, the file will be redownloaded.

Pull Request Check List

  • Added tests for changed code.
  • Updated documentation for changed code.

@ombschervister ombschervister force-pushed the master branch 2 times, most recently from 66c416b to 66c2267 Compare October 28, 2020 13:55
@finswimmer finswimmer requested a review from a team October 30, 2020 18:59
@ombschervister ombschervister force-pushed the master branch 4 times, most recently from d1ddcc6 to c23be60 Compare November 6, 2020 09:26
@finswimmer finswimmer added the kind/enhancement Not a bug or feature, but improves usability or performance label Nov 9, 2020
cache_dir = self.get_cache_directory_for_link(link)

archive_types = ["whl", "tar.gz", "tar.bz2", "bz2", "zip"]
links = []
for archive_type in archive_types:
for archive in cache_dir.glob("*.{}".format(archive_type)):
links.append(Link(archive.as_uri()))

Copy link

@earonesty earonesty Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a test to https://github.com/python-poetry/poetry/blob/master/tests/installation/test_chef.py is needed to cement the contract that a bad link hash is not accepted

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@earonesty
Copy link

fixes: #3326

@joooeey
Copy link

joooeey commented Dec 28, 2021

I think it would be better to have Poetry remove the broken file from the cache in place of the user. The cache is the software's responsiblity, not the user's.

@ombschervister ombschervister force-pushed the master branch 4 times, most recently from 1ee3f31 to 3e439a4 Compare January 3, 2022 21:56
@ombschervister
Copy link
Author

@joooeey

I think it would be better to have Poetry remove the broken file from the cache in place of the user. The cache is the software's responsiblity, not the user's.

In this request, Poetry itself removes the broken files. When loading the library, Poettry warns about the corruption of the file in the cache, and then overwrites the file itself.

image

@ombschervister ombschervister removed the request for review from a team January 3, 2022 22:26
Copy link

@earonesty earonesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs documentation change i think

earonesty
earonesty previously approved these changes May 5, 2022
Copy link

@earonesty earonesty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some documentation should be added, but otherwise, looks good.

@ombschervister
Copy link
Author

some documentation should be added, but otherwise, looks good.

Is this something I should do?
Do you mean that I should add comments to my code in the request?
Also I see the documentation in the docs folder but I don't see a place for such detailed implementation details

@Secrus Secrus requested a review from a team May 22, 2022 21:11
@Secrus
Copy link
Member

Secrus commented May 22, 2022

some documentation should be added, but otherwise, looks good.

Is this something I should do? Do you mean that I should add comments to my code in the request? Also I see the documentation in the docs folder but I don't see a place for such detailed implementation details

Hi. When it comes to documentation, it's expected to add necessary changes to docs if needed. .md files in the docs directory are named the same as sections in the documentation on the Poetry website. However, in this case, I don't think there is a need for change. We don't have information in the documentation about how the cache is working.

Also, please merge with master branch and resolve conflicts.

if link.hash:
real_hash = get_file_hash(archive, link.hash_name)
if real_hash != link.hash:
logger = logging.getLogger(__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loggers are usually declared on top of files as globals. Also, you need to add the logger to loggers in src.poetry.console.application in line 234.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, but after adding the logger to src.poetry.console.application, the messages began to be duplicated
image

real_hash = get_file_hash(archive, link.hash_name)
if real_hash != link.hash:
logger = logging.getLogger(__name__)
logger.warning(f"cache of {link.filename} is corrupted")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(f"cache of {link.filename} is corrupted")
logger.warning(f"Cache of {link.filename} is corrupted. It won't be used.")

Leaving only a warning that the cache is corrupted, might confuse user.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "Cache of {} is corrupted. The file will be reloaded."

Comment on lines 143 to 172
with open(filepath, "rb") as f:

res_hash = hashlib.new(hash_name)

while True:
buffer = f.read(block_size)
if not buffer:
break

res_hash.update(buffer)

return res_hash.hexdigest()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for that many empty lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 85 to 122
def test_get_file_hash():

with tempfile.TemporaryDirectory() as tmp_dir:

test_data = [b"", b"12345", b"123" * 8000]

for i, data in enumerate(test_data):

tmp_file = Path(tmp_dir) / f"file{i}"
tmp_file.write_bytes(data)

real_hash = hashlib.md5(data).hexdigest()
assert get_file_hash(tmp_file, "md5") == real_hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, I don't think you need that many empty lines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -42,10 +46,6 @@ def is_wheel(self, archive: Path) -> bool:
return archive.suffix == ".whl"

def get_cached_archive_for_link(self, link: Link) -> Link:
# If the archive is already a wheel, there is no need to cache it.
if link.is_wheel:
return link
Copy link
Author

@ombschervister ombschervister May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also whl was downloaded every poetry install, but at the same time saved to the cache folders.
So the cache exists but was useless.
So I removed this lines.

@ombschervister ombschervister force-pushed the master branch 2 times, most recently from 3ca2284 to 54bfc99 Compare May 24, 2022 05:12
@ombschervister
Copy link
Author

@Secrus should i do something else?

@earonesty
Copy link

@0xDEC0DE and @sdispater worked on a similar issue, maybe they can approve.

Copy link
Contributor

@0xDEC0DE 0xDEC0DE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. One small suggestion, inline.

Comment on lines +91 to +92
f"Cache of {link.filename} is corrupted. The file will be"
" reloaded."
Copy link
Contributor

@0xDEC0DE 0xDEC0DE Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps including the expected+actual hashes in the error messages might help people spot weird problems?

e.g., if a bunch of cached files were truncated to 0 bytes, they'd all log with the same hash, and users would know something wacky had happened.

Or it might be pointless noise. Who knows?

Antaxify pushed a commit to blackshark-ai/poetry that referenced this pull request Aug 1, 2022
Cherry picked from PR: python-poetry#3301

Without these changes, Poetry fails if a file in the cache is corrupt
and it needs to be cleaned up manually
@wagnerluis1982
Copy link
Contributor

I think this PR is not relevant anymore (and must be closed), since, as far as I understand, current Poetry version already check hashes for every file, cached or not.

But, perhaps the author can salvage something. Two things come to my mind: (1) I didn't find on current code a dedicated test to verify cached hash mismatch, and (2) maybe it can be introduced the nice log messages when there is a cache mismatch.

Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement Not a bug or feature, but improves usability or performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants