Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port util metaprogramming files to Python3 #6072

Merged
merged 3 commits into from
Jul 6, 2018

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Jul 6, 2018

Part of #6062

Problem

ABCMeta expects unicode on Python3, but bytes on Python2.

Solution

We use feature detection to try with idiomatic Py3 and fall back on Py2. This is preferred over version detection (i.e. checking which Python interpreter is used).

Note that the source of this file, twitter.commons.lang, did not have to do this because they don't use from __future__ import unicode_literals.

@Eric-Arellano Eric-Arellano changed the title Port util/meta.py to Python3 Port util metaprogramming files to Python3 Jul 6, 2018
@@ -50,7 +51,6 @@ def datatype(field_decls, superclass_name=None, **kwargs):

if not superclass_name:
superclass_name = '_anonymous_namedtuple_subclass'
superclass_name = str(superclass_name)

namedtuple_cls = namedtuple(superclass_name, field_names, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this line, because it will always be passed unicode no matter the Python version. This seems to be okay per the tests, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this -- when I wrote that I didn't understand the implications of unicode_literals and was having a lot of issues. There are multiple places I've used datatype with a field that should have been string-typed, and failed to add a type check because I wasn't sure how to handle it -- should have bubbled up the concern then. Will try to keep an eye on this in the future, but any further strangeness you may find with datatypes especially around string handling could be accidental, so please feel free to make these sorts of changes in the future. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s tricky code! I was thrown off by ABCMeta requiring bytes in python 2.

I’ll continue to break out any changes we need to make due to meta in separate PRs.

"Type names and field names can only contain alphanumeric characters "
"and underscores: \"<type 'str'>\"")
self.assertEqual(str(cm.exception), expected_msg)
def compare_string_rep(string_type):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including from builtins import str at the top of this file changes the behavior of these tests under TypedDatatypeTest. Instead of returning str, datatype returns newstr, which is future's backport of Python3's str class.

As I understand, this change is introduced only because changes made to this test file, not to the underlying objects.datatype() function. When you remove from builtins import str from this file, the old version of the tests pass. That's a good thing - we aren't breaking the interface for a bunch of things using objects.datatype().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your understanding here is correct, in case you were wondering.

@Eric-Arellano
Copy link
Contributor Author

cc @iantabolt @mateor

"""error: in constructor of type AnotherTypedDatatype: type check error:
field 'elements' was invalid: value 'should be list' (with type 'str') must satisfy this type constraint: Exactly(list).""")
self.assertEqual(str(cm.exception), expected_msg)
def compare_string_rep(string_type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps rather than making all of these conditional, you could match against regexes instead? Alternatively, maybe have your helper function do something like:

self.assertIn(str(cm).exception, [expected_message_with_str, expected_message_with_newstr])

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally started that way but changed to these conditionals to align code more with the feature checking idiom. It makes it more explicit this is for the sake of compatibility, and allows us to remove python 2 more easily once we stop supporting it.

Alas it’s not the prettiest code..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, yea... that's reasonable I guess.

@stuhood stuhood merged commit 2a14a72 into pantsbuild:master Jul 6, 2018
@Eric-Arellano Eric-Arellano deleted the port-util-meta branch July 6, 2018 23:09
stuhood pushed a commit that referenced this pull request Jul 9, 2018
Ports everything `util/` except for `objects.py` and `meta.py` (see #6072), `process_handler.py`, and `tarutil.py`.

Part of #6062
stuhood pushed a commit that referenced this pull request Jul 15, 2018
Followup from #6072, which was an incorrect representation of the changes made to metaprogramming with Python 3 port. Part of #6062.

## Problem
`object.py`'s `datatype()` function expects exactly one type. This poses an issue when we add `from builtins import str` to files - in Py2, the type will become `newstr`, whereas in Py3 the type will become `str`. `type(newstr) != type(str)`, even though the behavior is similar, so `datatype()` will raise an exception.

We ran into this issue originally with `test_objects.py`. The original solution only fixed the tests, it didn't actually fix the underlying problem, which I realized when `engine/fs.py` starting breaking with PRs that touched it like the [`backend/jvm` port](#6092).

## Solution
The solution has two parts:
1) Wherever we define a class using the metaclass of `datatype`, set the parameter to `future.utils.text_type`, rather than `str`. This means in Py2 the type will be `unicode` and in Py3 the type will be `str`. Both have the same unicode semantics, which is a good thing.

```python
from future.utils import text_type

class Foo(datatype([('val', text_type)])): pass
```

2) When instantiating a class that uses `dataclass` as its metaclass, wrap the parameter calls with `text_type(x)`. This will call `unicode(x)` in Py2 and `str()` in Py3, resulting in the correct type for the class definition from the first step. 

```python
from future.utils import text_type

example = Foo(text_type('🐍'))
```

## What this impacts
Anywhere we create a class with `datatype()` that sets a parameter to `str`, we'll have to make this change to both the class definition and all of its instantiations. 

The biggest example of this is in `engine/fs.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants