Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test_objects handling of dataclass() py2-py3 compatibility #6098

Merged
merged 2 commits into from
Jul 15, 2018

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Jul 11, 2018

Followup from #6072, which was an incorrect representation of the changes made to metaprogramming with Python 3 port. Part of #6062.

Problem

object.py's datatype() function expects exactly one type. This poses an issue when we add from builtins import str to files - in Py2, the type will become newstr, whereas in Py3 the type will become str. type(newstr) != type(str), even though the behavior is similar, so datatype() will raise an exception.

We ran into this issue originally with test_objects.py. The original solution only fixed the tests, it didn't actually fix the underlying problem, which I realized when engine/fs.py starting breaking with PRs that touched it like the backend/jvm port.

Solution

The solution has two parts:

  1. Wherever we define a class using the metaclass of datatype, set the parameter to future.utils.text_type, rather than str. This means in Py2 the type will be unicode and in Py3 the type will be str. Both have the same unicode semantics, which is a good thing.
from future.utils import text_type

class Foo(datatype([('val', text_type)])): pass
  1. When instantiating a class that uses dataclass as its metaclass, wrap the parameter calls with text_type(x). This will call unicode(x) in Py2 and str() in Py3, resulting in the correct type for the class definition from the first step.
from future.utils import text_type

example = Foo(text_type('🐍'))

What this impacts

Anywhere we create a class with datatype() that sets a parameter to str, we'll have to make this change to both the class definition and all of its instantiations.

The biggest example of this is in engine/fs.py

@Eric-Arellano
Copy link
Contributor Author

cc @cosmicexplorer @mateor

stuhood pushed a commit that referenced this pull request Jul 14, 2018
This applies the principles learned in #6098 to the `engine/fs.py` classes like `DirectoryToMaterialize`. Refer to that PR for an explanation of why this is necessary.

Will fix the broken test in #6092. Part of #6062
@stuhood stuhood requested review from benjyw and cosmicexplorer July 14, 2018 04:55
Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for figuring this out and fixing this! I distinctly remember using str in this test file, seeing errors because I didn't realize what unicode literals meant at the time, and being pretty frustrated and having a great deal of trouble understanding what was wrong. It really irked me when not solving this led to annoyances in other places like Snapshot, which had to do an explicit conversion to str (which has since been removed). Thanks again for taking another look and fixing the problem.

The only thing I would like to see is a comment in the datatype docstring which says to use text_type for string-typed datatype arguments, because it seems like without looking at this file people who didn't see this PR might just use str or unicode and fall into the same issue.

other thoughts:

Now that I'm thinking about it, would it make sense to potentially check for declarations of str or unicode-typed fields in datatype() invocations and raise an error? I don't think that needs to be in the scope of this PR because this is focused on 2/3 compat, but it seems like it would be a waste to solve this problem and then allow users to accidentally use the wrong type. It would also immediately make it easier to spot uses of str that we may have missed, because those will raise. I think that probably sounds like a job for a followup PR, along with updating the engine README -- but in this PR just noting that text_type should be used in the docstring (so that we maintain 2/3 compat for future datatypes people write) would be sufficient and useful.

I also just realized the datatype docstring needs to be updated anyway to note that we support arbitrary TypeConstraints, not just specific types. That doesn't need to be done here, I think.

I was only able to find one instance of this, but ExecuteProcessRequest should be converted to using this too at some point -- that doesn't need to be done here.

@Eric-Arellano
Copy link
Contributor Author

Happy to make both of those changes when I get back on Monday!

I think the ExecuteProcessRequest should be a separate PR since this only fixes the tests.

I can attach the comment and/or assertion in this PR or a separate one.

Regarding accepting arbitrary TypeConstraints, we should probably add testing around that.

@Eric-Arellano
Copy link
Contributor Author

Actually I don’t it will be possible to add an assertion for the datatype constructor.

text_type is simply an alias, not an actual type. It’s defined as:

if PY2:
  text_type = unicode 
else:
  text_type = str

So datatype() will never know it’s actually being passed text_type, it will only see the corresponding primitive type.

I think the documentation will help at least.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jul 14, 2018

I didn't realize that about text_type (and should have checked) -- thanks for elaborating! To clarify, all I was looking for in this PR is just a line in the docstring saying to use text_type for string fields -- definitely with you on keeping these PRs focused, wasn't trying to slow you down too much. The rest was just idle thoughts -- I will try to be more clear in the future.

There is a little testing around accepting TypeConstraints in datatypes in this file -- see WithSubclassTypeConstraint and WithExplicitTypeConstraint. There could potentially be more, although the TypeConstraints themselves are tested as well, so not sure where the testing could be improved specifically -- would love to expand the testing if there's more to be done though.

It bothers me a little that we can't have the assertion that we're using text_type, so I might try to think of a workaround for that as well (thinking of something along the lines of disallowing str or unicode fields at all and requiring the use of e.g. Exactly(text_type), which we could provide in objects.py). Would only do that if it seems worth it though.

@stuhood
Copy link
Member

stuhood commented Jul 15, 2018

@cosmicexplorer : That seems like a bit much to add to this PR. I think that the documentation you're suggesting is not datatype specific at all... It's more along the lines of "when writing Python3 compatible code, use text_type", which isn't something that should go in the datatype docs, per-se.

If Eric and co are able to make enough progress here and execute the swapover, then those kinds of docs won't even be necessary... so I'd love to see that progress made.

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes a lot of sense. @Eric-Arellano thanks for responding in such depth -- I will work on focusing comments specifically towards the goal of 2/3 compat for these PRs in the future.

@Eric-Arellano
Copy link
Contributor Author

@cosmicexplorer thanks likewise!

Most of these porting changes are generated automatically and simple, but there have been a couple complex ones throughout the process like this datatype() question. In fact, that’s the whole point of this PR is that my understanding was wrong originally in its precursor port (#6072). I appreciate you all helping out in the process!

@stuhood stuhood merged commit e3279b4 into pantsbuild:master Jul 15, 2018
@Eric-Arellano Eric-Arellano deleted the fix-test-objects branch July 15, 2018 16:39
Eric-Arellano added a commit that referenced this pull request Jan 25, 2019
### Problem
`./pants3 --no-v1 --v2 test testprojects/tests/python/pants/dummies:passing_target` would fail to execute because we hardcoded the PEX we grab to always use Python 2, rather than matching the current interpreter.

Even after fixing this, the command would succeed but output the results as a byte string without `\n` rendered properly, due to improper unicode handling.

### Solution
Grab the PEX that corresponds to the current interpreter.

Also change the `datatype` for some V2 abstractions to `text_type`, rather than `str`. This is because `datatype()` requires the type to _exactly_ match, and the future backport `str` actually is of type `newstr` in Py2 rather than `unicode`. See #6098 for further context on how we support Py2 and Py3 with `datatype()`.

### Result
`./pants3 --no-v1 --v2 test testprojects/tests/python/pants/dummies:passing_target` works for both Python 2 and Python 3, including when switching between the two repeatedly.

@illicitonion and I discussed over Slack potential concerns with caching and the interpreter not changing properly, but this does not seem to be an issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants