-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SHA256 hash of .whl as info output #5908
Conversation
bb8f03d
to
8be73da
Compare
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Currently I'm trying to debug some issues with what appear to be corrupt wheels. It would be very useful to see what pip thought the state of things was as it wrote the wheel output; if a final corrupt distributed file is then different to what pip has saved in its build logs, you know the problem is somewhere after pip but before distribution. Currently we get a log of the initial creation, then the stamp when it gets moved in the final output location, e.g.: creating '/tmp/pip-wheel-71CpBe/foo-1.2.3-py2.py3-none-any.whl ... Stored in directory: /opt/wheel/workspace A lot happens in between this, so my suggestion is we add the final output file and it's hash before the "Stored in directory:", e.g. you now see: Building wheels for collected packages: simple Running setup.py bdist_wheel for simple: started Running setup.py bdist_wheel for simple: finished with status 'done' Finished: simple-3.0-py3-none-any.whl sha256=39005a57a6327972575072af82e11d0817439fe6a069381f6f2a123a8c0bf1cf Stored in directory: /tmp/pytest-of-iwienand/pytest-18/test_pip_wheel_success0/workspace/scratch Successfully built simple Despite the hash being fairly important for things like --require-hashes, AFAICS the final hash is not put in the logs at all currently, so I think this is generically helpful.
Any chance I could get a review of this? Either way? We keep a local copy of pip going to have this in it, which I would certainly rather dispense with if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR, and thanks for your great patience. It would be great to get this into the next release (next month). I'll add it to the 19.2 milestone now.
news/5908.feature
Outdated
@@ -0,0 +1,2 @@ | |||
Wheel builds will output the final filename and SHA256 hash of .whl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the imperative tone (see https://pip.pypa.io/en/stable/development/contributing/#news-entries ). Maybe something like this?
Log the final filename and SHA256 hash of a
.whl
file when done building a wheel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this with 09e3184
src/pip/_internal/wheel.py
Outdated
@@ -885,7 +892,10 @@ def _build_one_inside_env(self, req, output_dir, python_tag=None): | |||
wheel_name = os.path.basename(wheel_path) | |||
dest_path = os.path.join(output_dir, wheel_name) | |||
try: | |||
wheel_hash, _ = hash_file(wheel_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be helpful for readers of the code to give a name to the second variable in the return value. You could add a code comment above the line saying, "We don't need to use the second value." if you would like to communicate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_length
should do, or also output the file size :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have put the length into the output string with a8e382b, so this value is now used
src/pip/_internal/wheel.py
Outdated
shutil.move(wheel_path, dest_path) | ||
logger.info('Finished: %s sha256=%s', | ||
wheel_name, wheel_hash.hexdigest()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other log messages in your post have the form "Running setup.py bdist_wheel for simple: finished ..." So maybe this could be "Finished wheel for <name>: ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this info()
message can be merged with the one right after. Maybe the directory path can go on a separate line like it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on the format and have changed that with a8e382b
I did leave the following info message (Stored in directory:
) somewhat on purpose, because I felt like it might be the type of thing that people had greps for in various ways in existing code. I have made this point explicit with a comment in a8e382b, and additionally added a match for it to the functional test.
The line will become quite long with the full path too; however if you feel strongly the backwards compatibility issues aren't a concern I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't feel strongly about merging the log messages. However, I'd rather not create any expectation among users (or future maintainers for that matter) that log messages won't or shouldn't change. One reason is that I think we should be open to improving these messages when we can as a matter of UX. Thus, my preference would be for the comment not to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line will become quite long with the full path too;
By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message). Thus, the lengths of lines needn't increase, and in fact the same greps could even work if the text was kept the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus, my preference would be for the comment not to be there.
Removed with 45a6415
By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message).
Ahh ... yes I wasn't sure about multi-line log strings. This probably comes from some ancient history of mine where I was dealing with syslog()
and /dev/msg
a lot where multi-line strings are not handled well at all :) Again if you feel strongly happy to convert it to a suggested format
src/pip/_internal/wheel.py
Outdated
"""Return (hash, length) for path using hashlib.sha256()""" | ||
h = hashlib.sha256() | ||
length = 0 | ||
with open(path, 'rb') as f: | ||
for block in read_chunks(f, size=blocksize): | ||
length += len(block) | ||
h.update(block) | ||
return (h, str(length)) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could return the length as int
since rehash
uses str(length)
and _build_one_inside_env
ignores it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good observation. For that matter (but not for this PR), rehash()
should probably also be made to return an int! (And the caller can convert to a string as needed.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made hash_file
return a int with 53c38d8. I will leave rehash()
for now to avoid further confusion :)
src/pip/_internal/wheel.py
Outdated
@@ -885,7 +892,10 @@ def _build_one_inside_env(self, req, output_dir, python_tag=None): | |||
wheel_name = os.path.basename(wheel_path) | |||
dest_path = os.path.join(output_dir, wheel_name) | |||
try: | |||
wheel_hash, _ = hash_file(wheel_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_length
should do, or also output the file size :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review; I'm not sure if you prefer a squashed change or multiple commits, I'm happy to do either.
src/pip/_internal/wheel.py
Outdated
shutil.move(wheel_path, dest_path) | ||
logger.info('Finished: %s sha256=%s', | ||
wheel_name, wheel_hash.hexdigest()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on the format and have changed that with a8e382b
I did leave the following info message (Stored in directory:
) somewhat on purpose, because I felt like it might be the type of thing that people had greps for in various ways in existing code. I have made this point explicit with a comment in a8e382b, and additionally added a match for it to the functional test.
The line will become quite long with the full path too; however if you feel strongly the backwards compatibility issues aren't a concern I can change it.
Thanks! Either way is fine. We tend to squash when committing anyways using the button (unless the proposer has made deliberately clean and crafted commits, which usually requires rebasing after addressing review comments). |
Returning an int for the length makes for a better interface for this new function.
81762b3
to
45a6415
Compare
This rewords the output to be more like the form of the preceeding messages. Additionally the size is added, since we have calculated it anyway. The output will now look like: Collecting simple==3.0 Building wheels for collected packages: simple Building wheel for simple (setup.py): started Building wheel for simple (setup.py): finished with status 'done' Created wheel for simple: filename=simple-3.0-py3-none-any.whl size=1138 sha256=2a980a802c9d38a24d29aded2dc2df2b080e58370902e5fdf950090ff67aec10 Stored in directory: /tmp/pytest-of-iwienand/pytest-0/test_pip_wheel_success0/workspace/scratch Successfully built simple A match for the following line is added to the functional test too.
Thanks again, @ianw! 🎉 |
Currently I'm trying to debug some issues with what appear to be
corrupt wheels. It would be very useful to see what pip thought the
state of things was as it wrote the wheel output; if a final corrupt
distributed file is then different to what pip has saved in its build
logs, you know the problem is somewhere after pip but before
distribution.
Currently we get a log of the initial creation, then the stamp when it
gets moved in the final output location, e.g.:
A lot happens in between this, so my suggestion is we add the final
output file and it's hash before the "Stored in directory:", e.g. you
now see:
Despite the hash being fairly important for things like
--require-hashes
, AFAICS the final hash is not put in the logs at allcurrently, so I think this is generically helpful.