Use better temporary files mechanism in install_unpacked_wheel #7703

rudolfovic · 2020-02-05T21:04:14Z

Fixes #7699

rudolfovic · 2020-02-05T21:08:45Z

The documentation talks about three types of news:

removal, feature, bugfix

The issue has type refactor so I am not sure what the protocol is.

sbidoul

@rudolfovic thank you for picking this up.

Two remarks:

there are changes to various NEWS file that are unrelated and should not be part of this PR
there are other place in wheel.py where the same pattern is used that need to be updated too

sbidoul · 2020-02-06T08:38:30Z

src/pip/_internal/operations/install/wheel.py

    with open_for_csv(record, 'r') as record_in:
-        with open_for_csv(temp_record, 'w+') as record_out:
+        with adjacent_tmp_file(record, mode='w') as record_out:


Any particular reason to change w+ to w?

Also, are we sure that adjacent_tmp_file is equivalent to open_for_csv for our purpose here? (I've not investigated myself yet)

Any particular reason to change w+ to w?

We are not reading anything from this file so why request the extra permissions?

Also, are we sure that adjacent_tmp_file is equivalent to open_for_csv for our purpose here? (I've not investigated myself yet)

All open_for_csv does is append some parameters to open() to account for differences between different versions of Python.

adjacent_tmp_file calls NamedTemporaryFile which in turn calls _sanitize_params, calls close() on exit (in addition to that of file) and ensures file is written to disk before it's closed.

Tests would not pass if binary mode was used even with Python 2 but they do otherwise (on both versions). So I do not see any obstacles to replacing open_for_csv.

Getting the open parameters correct for CSV files is fairly subtle, and having open_for_csv is important to make sure it's easy for people to get it correct even if they aren't aware of platform differences. Having text mode and newline='' is crucial for correct handling on Windows under Python 3 (and having binary mode is crucial on Python 2). So I'd be concerned that the switch to adjacent_tmp_file is generating a CSV file with incorrect line endings. Do we know for sure that we have a test that checks this?

Note: I don't actually know (without doing some research) the precise "rules"¹ for the format - but I do know that the docs in the CSV module (in particular the footnote here) were arrived at after quite a lot of frustrating investigation into weird bugs.

¹ CSV is not a particularly well defined format, so they aren't actually so much rules as "hard-won knowledge about what works" 🙄

I see the link to the newline comment and this is addressed by the io module internally. I do not see where it says Python 2 insists on binary mode. A simple test shows:

Python 2.7.10 (default, Feb 22 2019, 21:17:52) [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import csv >>> with open('/tmp/bla', 'w') as f: ... c = csv.writer(f) ... c.writerow(['hello']) ... >>> with open('/tmp/bla', 'r') as f: ... c = csv.reader(f) ... print '\n'.join(','.join(l) for l in c) ... hello >>>

Binary also works:

>>> with open('/tmp/bla', 'rb') as f: ... c = csv.reader(f) ... print '\n'.join(','.join(l) for l in c) ... hello

As per documentation for open():

Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.

Since we only ever deal with text data in CSVs there is no reason why it should matter. All Python documentation uses b everywhere because in Python 2 it essentially had no adverse effect. It was required for some systems and harmless for others.

https://docs.python.org/2/library/csv.html#csv.reader - "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference". You're running on MacOS, which is a platform where it doesn't make a difference. SO it's not surprising you can't demonstrate an issue (your specific test does work on Windows, but again, that just means I suspect your test is incomplete - I have had actual, real-life errors caused by failing to use binary mode on Windows).

Anyhow, regardless of how much we debate test cases that demonstrate one thing or another, that's irrelevant, We should follow the documented requirements of the CSV module, not waste time debating whether they are needed. If you feel sufficiently strongly, by all means raise a CPython bug that claims the requirements in the docs are obsolete and should be removed.

So then I can extract the version -> mode part from open_for_csv and call it there in addition to adjacent_tmp_file(file, 'w' + get_mode()).

For what it's worth I did this in our recent wheel builder helper test code, which was the easiest way I could find that was compatible.

This would also avoid needing to make changes to adjacent_tmp_file.

pfmoore

My comment about open_for_csv doesn't appear to have registered in github as a review, so referencing it here.

pradyunsg · 2020-03-19T17:33:24Z

My suggestion is to use pip._internal.utils.temp_dir.TempDirectory for this, like so:

record = os.path.join(dest_info_dir, 'RECORD')
temp_dir = TempDirectory(kind="record", globally_managed=True)
temp_record = os.path.join(temp_dir.path, 'RECORD')

And let the rest of the code stay as-is for now.

BrownTruck · 2020-04-02T10:00:03Z

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

pradyunsg · 2020-04-02T17:19:43Z

This PR was superceded by #7929. Many thanks for filing this PR @rudolfovic!

Let us know if you'd like a different issue to contribute on! ^>^

xavfernandez added the skip news Does not need a NEWS file entry (eg: trivial changes) label Feb 5, 2020

rudolfovic force-pushed the master branch 3 times, most recently from b57e097 to e56f245 Compare February 5, 2020 23:14

Use better temporary files mechanism in install_unpacked_wheel

b79997c

rudolfovic force-pushed the master branch from e56f245 to b79997c Compare February 5, 2020 23:18

rudolfovic requested a review from sbidoul February 6, 2020 06:29

sbidoul requested changes Feb 6, 2020

View reviewed changes

sbidoul reviewed Feb 6, 2020

View reviewed changes

pfmoore requested changes Feb 6, 2020

View reviewed changes

BrownTruck added the needs rebase or merge PR has conflicts with current master label Apr 2, 2020

pradyunsg closed this Apr 2, 2020

lock bot added the auto-locked Outdated issues that have been locked by automation label May 5, 2020

lock bot locked as resolved and limited conversation to collaborators May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use better temporary files mechanism in install_unpacked_wheel #7703

Use better temporary files mechanism in install_unpacked_wheel #7703

rudolfovic commented Feb 5, 2020

rudolfovic commented Feb 5, 2020

sbidoul left a comment

sbidoul Feb 6, 2020

sbidoul Feb 6, 2020

rudolfovic Feb 6, 2020

rudolfovic Feb 6, 2020

pfmoore Feb 6, 2020

rudolfovic Feb 6, 2020

rudolfovic Feb 6, 2020

pfmoore Feb 6, 2020

rudolfovic Feb 6, 2020

chrahunt Feb 8, 2020 •

edited

Loading

pfmoore left a comment

pradyunsg commented Mar 19, 2020 •

edited

Loading

BrownTruck commented Apr 2, 2020

pradyunsg commented Apr 2, 2020

Use better temporary files mechanism in install_unpacked_wheel #7703

Use better temporary files mechanism in install_unpacked_wheel #7703

Conversation

rudolfovic commented Feb 5, 2020

rudolfovic commented Feb 5, 2020

sbidoul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrahunt Feb 8, 2020 • edited Loading

Choose a reason for hiding this comment

pfmoore left a comment

Choose a reason for hiding this comment

pradyunsg commented Mar 19, 2020 • edited Loading

BrownTruck commented Apr 2, 2020

pradyunsg commented Apr 2, 2020

chrahunt Feb 8, 2020 •

edited

Loading

pradyunsg commented Mar 19, 2020 •

edited

Loading