Simplify version parsing example #571

techtonik · 2018-11-11T16:24:53Z

This removes regexp and adds import os.

File is read in cp437 because this encoding has no missing code points while converting to utf-8 string. Only ASCII set is used anyways.

This removes regexp and adds `import os`. File is read in `cp437` because this encoding has no missing code points while converting to `utf-8` string. Only ASCII set is used anyways.

theacodes · 2018-11-12T18:56:48Z

Thanks for doing this, @techtonik, but I find the new one much more difficult to read. Would it be possible to reduce the amount of nesting?

techtonik · 2018-11-13T01:05:30Z

@theacodes nothing I can think of. It is either flatness or simplicity. For people who haven't learn what a regular expression is, this is already an advantage.

techtonik · 2018-11-13T01:12:04Z

On the second thought my similar improvement for pip looks lighter - pypa/pip#6004 - but I still don't understand the logic behind codecs.open - documentation doesn't say anything about which encoding is used, so I guess iteration will fail if I have Russian comments in cp1251 in version file.

theacodes · 2018-11-13T17:41:13Z

Yeah, let's go with pip's version.

pradyunsg · 2018-11-13T19:18:08Z

There's a few changes we should make there IMO like using os.path.join and I think that currently (untested) that'll gives us the the string with quotes, so those have to be trimmed.

I'm super short on time, so can't write proper review comments on the PR, sorry!

techtonik · 2018-11-21T04:59:00Z

@pradyunsg why use os.path.join in this specific case? It doesn't fail on Windows.

techtonik · 2018-11-21T05:08:35Z

@theacodes can you explain what encoding is used by codecs.open to make sure it won't fail on files that contain Russian comments?

theacodes · 2018-11-21T18:38:50Z

I honestly have no idea. I don't think it's possible for us to specify a general solution that'll work in every single case, so we should probably just use utf-8 but note that you might need to change it.

pfmoore · 2018-11-21T19:34:58Z

As it's only intended as example code, and it's reading a file that you create yourself, I think it's reasonable to not worry too much about encodings. If you're only supporting Python 3.2+, tokenize.open() respects an encoding declaration in the source file, which this code doesn't. But is it that important?

If you're not comfortable writing your own code based on the supplied example, I'd argue that you should probably be sticking to the simpler technique of just maintaining the version number in setup.py, as suggested in the tutorial anyway.

Basically, I agree, this change isn't much of an improvement, so I'd prefer to stick with what we have.

techtonik · 2018-11-21T20:57:16Z

Ok. Let me tell you what happens with this code if it is run on Greek Windows when my version file contains Russian comment. The code from this guide will fail.

https://bitbucket.org/techtonik/hexdump/pull-requests/1/since-the-input-file-is-a-specific/diff

Needless to say that having a guide with magic that writers themselves don't understand says something about the language. Explicit cp437 is the best choice. I can explain.

pfmoore · 2018-11-21T21:32:56Z

So if you want to use a Russian comment, modify the code to handle it. Just add an explicit encoding to codecs.open. What's the problem?

Regarding "writers themselves don't understand", frankly few of us still recall all the bizarre contortions needed for Python 2/3 compatibility. I'd happily say replace codecs.open() with open() and leave people wanting Python 2 compatibility to work out how to do it. But you'd then have to change it back to use codecs.open, because Python 2's open doesn't support an encoding. So how's that an improvement?

techtonik · 2018-11-22T03:24:29Z

So if you want to use a Russian comment, modify the code to handle it. Just add an explicit encoding to codecs.open. What's the problem?

The problem is that it is very hard to debug, non-intuitive and requires a sacred knowledge that doesn't hold in ones head as this thread shows. Python doesn't give any hints to make the inference process easier.

Regarding "writers themselves don't understand", frankly few of us still recall all the bizarre contortions needed for Python 2/3 compatibility.

It is only about Python 3. Python 2 is irrelevant, because it does not autoconvert files with occassional system encoding on reading. Replacing open() with codecs.open() doesn't fix the bug I linked above. Knowing encoding beforehand doesn't make the snippet universal. Either reading in binary or in universal encoding without missing code points to convert ASCII version to string is a universal solution.

ncoghlan · 2018-11-22T11:14:33Z

The simplest and most appropriate clarification here is to just pass an explicit encoding='utf-8' to codecs.open in the existing example (since the Python docs are indeed unclear on what that defaults to in Python 3.x, in 2.7 it defaults to ASCII which isn't helpful for files that aren't encoded that way, and explicitly including the argument makes it clear where to specify the encoding if you use a non-utf-8 encoding for the file containing the version metadata in your particular project).

A comment could also be included saying that builtin open() can be used directly in projects that don't support Python 2.x at all (since 2.x support is the only reason the example uses codecs.open instead).

To avoid problems with unknown encoding, file with __version__ is processed as binary and the final number is converted to string just before the function returns.

techtonik · 2018-11-26T07:09:44Z

Until codecs.open documentation is clear I decided to parse everything as binary. This works Python 2.7 and Python 3.x the same way. As long as version itself is an ASCII string, everything else in a file doesn't matter.

theacodes · 2019-01-22T03:35:46Z

I'm so sorry for letting this PR languish for so long.

Until codecs.open documentation is clear I decided to parse everything as binary. This works Python 2.7 and Python 3.x the same way. As long as version itself is an ASCII string, everything else in a file doesn't matter.

I'm not completely comfortable with this. I do like @ncoghlan's suggestion of using codecs.open and explicitly passing utf-8 with a comment noting that differing encodings may cause trouble.

techtonik · 2019-01-22T18:47:49Z

@theacodes I am not comfortable with undocumented codecs.open behavior either.

theacodes · 2019-01-22T18:52:06Z

Help me understand what undocumented behavior codecs.open presents? @ncoghlan and I are suggesting the following:

# Open the version file using utf-8 encoding. If you are using different encoding for your source
# file, change the encoding to match. See https://www.python.org/dev/peps/pep-0263/ for more
# details.
codecs.open(version_file, "r", encoding="utf-8"):

techtonik · 2019-02-10T04:27:32Z

@theacodes the undocumented behavior is what happens if a person left encoding parameter for codecs.open unspecified. 45 minutes is a price that is too high for a missing param.

Behind the scenes codecs.open is opening a file in binary mode, so I don't see why not to just use binary matching without additional wrapper. Do you have any arguments?

This allows people who use setuptools to maintain project versions in a single place, for example as attributes in their Python source. Closes pypa#1316 Closes pypa/pip#6004 Closes pypa/packaging.python.org#571

techtonik · 2019-02-10T08:51:16Z

I submitted the function to setuptools pypa/setuptools#1679

Spent 2 more hours polishing the code that I thought is final. If merged, it would be the most simple way to fetch version string, because I believe pip ships with setuptools by defaut.

Simplify version parsing example

656e185

This removes regexp and adds `import os`. File is read in `cp437` because this encoding has no missing code points while converting to `utf-8` string. Only ASCII set is used anyways.

Parse __version__ file as binary

7d7a171

To avoid problems with unknown encoding, file with __version__ is processed as binary and the final number is converted to string just before the function returns.

techtonik mentioned this pull request Feb 10, 2019

Add setuptools.get_version(path, field='__version__') pypa/setuptools#1679

Closed

2 tasks

theacodes closed this Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify version parsing example #571

Simplify version parsing example #571

techtonik commented Nov 11, 2018

theacodes commented Nov 12, 2018

techtonik commented Nov 13, 2018 •

edited

Loading

techtonik commented Nov 13, 2018 •

edited

Loading

theacodes commented Nov 13, 2018

pradyunsg commented Nov 13, 2018

techtonik commented Nov 21, 2018

techtonik commented Nov 21, 2018

theacodes commented Nov 21, 2018

pfmoore commented Nov 21, 2018

techtonik commented Nov 21, 2018

pfmoore commented Nov 21, 2018

techtonik commented Nov 22, 2018

ncoghlan commented Nov 22, 2018

techtonik commented Nov 26, 2018

theacodes commented Jan 22, 2019

techtonik commented Jan 22, 2019

theacodes commented Jan 22, 2019

techtonik commented Feb 10, 2019 •

edited

Loading

techtonik commented Feb 10, 2019

Simplify version parsing example #571

Simplify version parsing example #571

Conversation

techtonik commented Nov 11, 2018

theacodes commented Nov 12, 2018

techtonik commented Nov 13, 2018 • edited Loading

techtonik commented Nov 13, 2018 • edited Loading

theacodes commented Nov 13, 2018

pradyunsg commented Nov 13, 2018

techtonik commented Nov 21, 2018

techtonik commented Nov 21, 2018

theacodes commented Nov 21, 2018

pfmoore commented Nov 21, 2018

techtonik commented Nov 21, 2018

pfmoore commented Nov 21, 2018

techtonik commented Nov 22, 2018

ncoghlan commented Nov 22, 2018

techtonik commented Nov 26, 2018

theacodes commented Jan 22, 2019

techtonik commented Jan 22, 2019

theacodes commented Jan 22, 2019

techtonik commented Feb 10, 2019 • edited Loading

techtonik commented Feb 10, 2019

techtonik commented Nov 13, 2018 •

edited

Loading

techtonik commented Nov 13, 2018 •

edited

Loading

techtonik commented Feb 10, 2019 •

edited

Loading