-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use better temporary files mechanism in install_unpacked_wheel #7703
Conversation
The documentation talks about three types of news:
The issue has type |
b57e097
to
e56f245
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rudolfovic thank you for picking this up.
Two remarks:
- there are changes to various NEWS file that are unrelated and should not be part of this PR
- there are other place in
wheel.py
where the same pattern is used that need to be updated too
with open_for_csv(record, 'r') as record_in: | ||
with open_for_csv(temp_record, 'w+') as record_out: | ||
with adjacent_tmp_file(record, mode='w') as record_out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to change w+
to w
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are we sure that adjacent_tmp_file
is equivalent to open_for_csv
for our purpose here? (I've not investigated myself yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to change
w+
tow
?
We are not reading anything from this file so why request the extra permissions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are we sure that
adjacent_tmp_file
is equivalent toopen_for_csv
for our purpose here? (I've not investigated myself yet)
All open_for_csv
does is append some parameters to open()
to account for differences between different versions of Python.
adjacent_tmp_file
calls NamedTemporaryFile
which in turn calls _sanitize_params
, calls close()
on exit (in addition to that of file
) and ensures file is written to disk before it's closed.
Tests would not pass if binary mode was used even with Python 2 but they do otherwise (on both versions). So I do not see any obstacles to replacing open_for_csv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the open parameters correct for CSV files is fairly subtle, and having open_for_csv
is important to make sure it's easy for people to get it correct even if they aren't aware of platform differences. Having text mode and newline=''
is crucial for correct handling on Windows under Python 3 (and having binary mode is crucial on Python 2). So I'd be concerned that the switch to adjacent_tmp_file
is generating a CSV file with incorrect line endings. Do we know for sure that we have a test that checks this?
Note: I don't actually know (without doing some research) the precise "rules"1 for the format - but I do know that the docs in the CSV module (in particular the footnote here) were arrived at after quite a lot of frustrating investigation into weird bugs.
1 CSV is not a particularly well defined format, so they aren't actually so much rules as "hard-won knowledge about what works" 🙄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the link to the newline comment and this is addressed by the io
module internally. I do not see where it says Python 2 insists on binary mode. A simple test shows:
Python 2.7.10 (default, Feb 22 2019, 21:17:52)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> with open('/tmp/bla', 'w') as f:
... c = csv.writer(f)
... c.writerow(['hello'])
...
>>> with open('/tmp/bla', 'r') as f:
... c = csv.reader(f)
... print '\n'.join(','.join(l) for l in c)
...
hello
>>>
Binary also works:
>>> with open('/tmp/bla', 'rb') as f:
... c = csv.reader(f)
... print '\n'.join(','.join(l) for l in c)
...
hello
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per documentation for open()
:
Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.
Since we only ever deal with text data in CSVs there is no reason why it should matter. All Python documentation uses b
everywhere because in Python 2 it essentially had no adverse effect. It was required for some systems and harmless for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.python.org/2/library/csv.html#csv.reader - "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference". You're running on MacOS, which is a platform where it doesn't make a difference. SO it's not surprising you can't demonstrate an issue (your specific test does work on Windows, but again, that just means I suspect your test is incomplete - I have had actual, real-life errors caused by failing to use binary mode on Windows).
Anyhow, regardless of how much we debate test cases that demonstrate one thing or another, that's irrelevant, We should follow the documented requirements of the CSV module, not waste time debating whether they are needed. If you feel sufficiently strongly, by all means raise a CPython bug that claims the requirements in the docs are obsolete and should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then I can extract the version -> mode
part from open_for_csv
and call it there in addition to adjacent_tmp_file(file, 'w' + get_mode())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth I did this in our recent wheel builder helper test code, which was the easiest way I could find that was compatible.
This would also avoid needing to make changes to adjacent_tmp_file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment about open_for_csv
doesn't appear to have registered in github as a review, so referencing it here.
My suggestion is to use record = os.path.join(dest_info_dir, 'RECORD')
temp_dir = TempDirectory(kind="record", globally_managed=True)
temp_record = os.path.join(temp_dir.path, 'RECORD') And let the rest of the code stay as-is for now. |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
This PR was superceded by #7929. Many thanks for filing this PR @rudolfovic! Let us know if you'd like a different issue to contribute on! ^>^ |
Fixes #7699