Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PEP 646 in typing_extensions #28

Closed
JelleZijlstra opened this issue Nov 18, 2021 · 35 comments · Fixed by python/typing#963
Closed

Implement PEP 646 in typing_extensions #28

JelleZijlstra opened this issue Nov 18, 2021 · 35 comments · Fixed by python/typing#963

Comments

@JelleZijlstra
Copy link
Member

PEP 646 has been accepted (subject to some confusion about an edge case that's still being discussed), and we should have an implementation of it in typing-extensions.

@mrahtz @pradeep90 is there already a runtime implementation of PEP 646 somewhere? If not, I'd be happy to work on getting the right bits into typing-extensions and typing.

@pradeep90
Copy link

pradeep90 commented Nov 19, 2021 via email

@mrahtz
Copy link

mrahtz commented Nov 19, 2021

(+1 to what Pradeep said.)

@JelleZijlstra
Copy link
Member Author

Thanks! I agree with many of Pradeep's comments on that PR; it seems to add too many runtime checks. I'm happy to pick it back up and submit a new version to CPython and typing-extensions.

@pradeep90
Copy link

pradeep90 commented Nov 21, 2021 via email

@mrahtz
Copy link

mrahtz commented Nov 23, 2021

That's very kind of you Jelle - thank you!

How much of the CPython work were you intending to do - just the work in typing.py, or also e.g. the non-grammar changes necessary to make list[*Ts] (rather than List[*Ts]) work?

@JelleZijlstra
Copy link
Member Author

Full disclosure is that my main motivation is to be able to (ab)use typing_extensions.Unpack for precise *args/**kwargs typing, so I want it to exist :). Also, it seems like acceptance of the PEP has been put on hold, so we should probably wait with CPython changes. But I'm happy to help get other parts of the implementation finished.

So my plan would be:

  • Get TypeVarTuple and Unpack into typing_extensions
  • Get them into CPython too, together with docs
  • Incorporate any changes from the CPython review process into typing_extensions
  • Implement anything else that needs doing. I'm not sure exactly what that may include, but we should at least test cases like list[*Ts] that you mention.

You already have an implementation for the grammar changes, right?

@stroxler
Copy link
Contributor

@JelleZijlstra looks like you're aiming to get a typing_extensions implementation before a typing one. Is that much easier that just doing typing right off the bat? Otherwise I wonder if we should just aim for typing right off the bat

Context on why I care: I'm ready to start trying to get a proof-of-concept for callable syntax, but I realized a blocker is that I'm going to need a way to represent TypeVarTuple and Unpack in cpython itself. So I need the typing code in a branch for my work anyway.

It seems like there may be no harm in putting up a PR on typing and only changing typing_extensions if it bogs down in code review

@JelleZijlstra
Copy link
Member Author

typing-extensions is harder, not easier, because it needs to deal with typing.py in older Python versions. I want to put it in typing-extensions first for two reasons: the PEP isn't accepted yet, so it doesn't make sense to put it in typing.py; and I'd like to be able to use the new features before Python 3.11 comes out.

@stroxler
Copy link
Contributor

I see - I was just chatting with @pradeep90 and he said the PEP was just accepted (looks like not on the PEP page yet), which is why I was wondering about starting on typing right away, since it partially blocks my work.

If it helps I could work on a PR based on @mrahtz's old code. I care more about typing than the syntax part of the implementation for my proof-of-concept, although it might be useful to work off of a branch with both since I need a similar unpack syntax

@mrahtz
Copy link

mrahtz commented Dec 2, 2021

Sorry for the slow reply - this thread dropped off my radar.

Re status of the PEP - it was accepted, but then we realised there's one edge case yet to fix, and the Steering Council have said they want to review those changes before giving it the final stamp of approval. So it seems pretty likely that the final version will be accepted, but we're not quite there yet.

You already have an implementation for the grammar changes, right?

Yup, in https://github.com/mrahtz/cpython/tree/pep646-grammar.

@stroxler, if you're wanting something in typing itself soon, I would propose the following breakdown of the work:

Not sure whether that makes sense though given the work that @JelleZijlstra has already done - @JelleZijlstra, what do you think?

@stroxler
Copy link
Contributor

stroxler commented Dec 2, 2021

@mrahtz I realized I'm probably not hard-blocked on typing.py per se

Instead, what I'm going to need is to mimic the grammar changes inside a case of my callable type. But I also would like to check in about whether I could try an alternative implementation of the same grammar.

After reading all of the docs on the parser and compiler, plus the implementation of PEP 604 using types.Union I am pretty convinced that we should add a types.UnpackType and make star-as-unpack in *args: *Ts resolve to this type.

This will allow us to get a Starred expression in the Ast directly, rather than a Tuple singleton containing a Starred expression, when we have a *args: *Ts annotation.

A code snippet to illustrate what I mean:

>>> import ast
>>> print(ast.dump(ast.parse("def f(*args: *Ts): ..."), indent=2))
Module(
  body=[
    FunctionDef(
      name='f',
      args=arguments(
        posonlyargs=[], args=[],
        vararg=arg(
          arg='args',
          annotation=Tuple(   # this tuple is extraneous
            elts=[
              Starred(      # the AST would make more sense if we get a Starred directly here
                value=Name(id='Ts', ctx=Load()),
                ctx=Load())],
            ctx=Load())),
        kwonlyargs=[],  kw_defaults=[],  defaults=[]),
      body=[Expr(value=Constant(value=Ellipsis))], decorator_list=[])],
  type_ignores=[])

I'm most concerned about the AST having an extra layer since that's what most tools will look at, but the enclosing Tuple also affects runtime behavior - we wind up getting not an _UnpackedTypeVarTuple but rather a singleton tuple containing one at runtime. This feels wrong.

Both these behaviors can be changed if we make an UnpackType builtin, analogous to UnionType. I'd kind of like to take a stab at it, because I'm going to need to do this quite a lot for a callable syntax reference implementation anyway.

@JelleZijlstra
Copy link
Member Author

@stroxler this sounds like you're conflating two different things: what the AST looks like, and what it gets compiled to. The PEP says *args would be equivalent to tuple(args), but your UnpackType builtin might imply adding a new dunder method.

@gvanrossum
Copy link
Member

Hopefully it doesn’t have to be IDENTICAL? At runtime I want to be able to tell which one they typed.

@JelleZijlstra
Copy link
Member Author

@gvanrossum that's what the PEP specifies: https://www.python.org/dev/peps/pep-0646/#change-2-args-as-a-typevartuple. *args in an annotation is equivalent to tuple(args).

I guess we could still change that (at the risk of further annoying the SC). As the author of a tool that would consume runtime annotations, I'm not particularly concerned about being able to tell the difference, though. Why do you think that's important?

@gvanrossum
Copy link
Member

Because they mean different things to static checkers.

@stroxler
Copy link
Contributor

stroxler commented Dec 2, 2021

Ah, yes @JelleZijlstra you're right it's explicitly called out. The pep is pretty long so I missed that sentence.

I think this is weird and might be a mistake - having a nonexistent tuple in the AST is pretty confusing, and I wonder if it might cause major issues for concrete syntax trees like LibCST to have an "invisible" tuple node with no printable characters (I'd have to ask @zsol).

But from a static type checker point of view we can live with it, the match statements will just be slightly ugly because of the nesting.

It doesn't actually block any of my work, it just feels awkard - it seems like we didn't want to implement a compiler change therefore we're doing strange things instead (which is very understandable - adding a new type and opcode is much harder - but I think it's the right thing to do here).

@stroxler
Copy link
Contributor

stroxler commented Dec 2, 2021

@gvanrossum's point kind of highlights why I find this a bit weird - we can't tell the difference in the Python AST between an explicit tuple with the star-expression (which I'm pretty sure we would naively expect type checkers to reject) and a bare star-for-unpack.

It doesn't really matter as of this PEP because a TypeVarTuple or Tuple are the only legal things in that position and we can live with the ambiguity in this one form, but I'm afraid that there might be edge cases that box us in if we want to use starred expressions in other ways in the future (and I'm not talking about just typing changes, or just changes in the next 5 years - we'll have to live with invisible tuples forever).

@JelleZijlstra
Copy link
Member Author

Reopening to track implementation of TypeVarTuple in typing-extensions. (The discussion of the AST is technically off topic here.)

@JelleZijlstra JelleZijlstra reopened this Dec 2, 2021
@gvanrossum
Copy link
Member

Sorry, I must have closed this by mistake.

@mrahtz
Copy link

mrahtz commented Dec 3, 2021

Alright, to check whether I understand so far, we're talking about two potential problems with the current behaviour of *args: *Ts:

  1. There's no way to tell the difference at runtime between *args: *Ts, *args: tuple(Ts), and *args: (*Ts,). (I've verified this is definitely true for the current implementation in https://github.com/mrahtz/cpython/tree/pep646-grammar.) This is less than ideal because it means that runtime checkers can't match the behaviour of static checkers: static checkers can flag *args: tuple(*Ts and *args: (*Ts,) as incorrect, but runtime checkers can't.

  2. *args: *Ts introduces an extra Tuple around the Starred in the AST, which a) is a little offputting aesthetically, but more specifically b) might make life harder for utilities which rely on the AST, e.g. auto-formatters, which might try and autoformat *args: *Ts as *args: (*Ts,).


Assuming my understand is correct, here's my take:

It was indeed a deliberate decision on my part to have things work this way (including the extra Tuple in the AST). The driving force behind that decision was consistency with how the star operator works in other contexts. I remember I did experiment with a solution which didn't introduce the extra Tuple in the AST, but iirc it required more significant changes because it required different opcode generation.

Having said that, I'm sympathetic to argument 1; I'll admit that I didn't foresee the problem of runtime checkers not being able to tell the difference. But on that front, *args: tuple(Ts) and *args: (*Ts,) are both just clearly whacky things to do. Should we really be performing grammar and opcode gymnastics for the sake of being able to detect this very unlikely form of shooting yourself in the foot? This seems especially unlikely to happen in practice because *args: *Ts is a power-user feature in the first place. I'm leaning towards thinking the tradeoff isn't worth it: complicating the grammar (and delaying the PEP even further) isn't worth being able to detect these cases at runtime. (But @gvanrossum, I'd like to hear your thoughts here too - obviously I'm biased on this as the author of the PEP, rather than the person who wants the language to stay clean 😅.)

For argument 2, I have more mixed feelings. For the case of auto-formatters in particular, I'm leaning towards saying "It doesn't seem that bad for auto-formatters to just have to special-case this". For concerns along the lines of what @stroxler says:

...I'm afraid that there might be edge cases that box us in if we want to use starred expressions in other ways in the future (and I'm not talking about just typing changes, or just changes in the next 5 years - we'll have to live with invisible tuples forever)

I'm sympathetic to these concerns too, but I don't want us to be paralysed by fear of the unknown; there's always a risk that we'll accidentally commit to something that boxes us into something unfortunate, and I don't think there's any reason to think the risk here is higher than that baseline level of risk. I think the best we can do is try to mitigate the risk by asking "What specifically might it cause problems with?" - but if we can't come up with any answers to that, we should accept the risk and continue onwards for the sake of being able to make progress. (And again, there are costs in the other direction too in terms of delaying a PEP that's already taken > 1 year to build consensus over.)

In the vein of asking "What specifically might it cause problems with if we want to use starred expressions in other ways in the future?" - to the extent that this is a problem with the way starred expressions are handled in *args: *Ts in particular, I can't imagine anything else we could want to do with starred expressions in the context of *args. To the extent this is a discussion about starred expressions more broadly, I would claim there's nothing in this PEP which yet justifies more radical changes to the behaviour of the star operator. And if something does come up in the future that demands more flexibility from the star operator in other contexts, the cost will be the same then as it is now: introducing inconsistency with how the star operator works in existing contexts such as (1, 2, *foo).

Tl;dr: I agree it's non-ideal, but I think it's worth the tradeoff of not having to make more significant grammar changes and not delaying the PEP.

@JelleZijlstra
Copy link
Member Author

Thanks for the summary!

I don't see (1) as much of a problem. There are already many ways to write annotations that look the same to a runtime checker but would be rejected by a static checker: -> list.__class_getitem__(int), -> type(1), etc. To me that seems harmless.

For (2), do I understand correctly that the AST for *args: *T and *args: (*T,) would be the same? That seems unfortunate because it would mean that tools like mypy, which use the Python AST, can't tell the difference either.

@mrahtz
Copy link

mrahtz commented Dec 3, 2021

For (2), do I understand correctly that the AST for *args: *T and *args: (*T,) would be the same? That seems unfortunate because it would mean that tools like mypy, which use the Python AST, can't tell the difference either.

Oh, yikes, I didn't realise that was the case. Indeed, checking using the implementation at https://github.com/mrahtz/cpython/tree/pep646-grammar, the following:

print(ast.dump(ast.parse('def foo(*args: *Ts): pass')))
print(ast.dump(ast.parse('def foo(*args: (*Ts,)): pass')))

do unfortunately both produce:

Module(
  body=[
    FunctionDef(
      name='foo',
      args=arguments(
        posonlyargs=[],
        args=[],
        vararg=arg(
          arg='args',
          annotation=Tuple(
            elts=[
              Starred(
                value=Name(id='Ts', ctx=Load()),
                ctx=Load())],
            ctx=Load())),
        kwonlyargs=[],
        kw_defaults=[],
        defaults=[]),
      body=[
        Pass()],
      decorator_list=[])],
  type_ignores=[])

This admittedly does carry more weight for me; it's a shame we didn't catch this earlier.

I lean towards still saying "This seems a sufficiently power-user feature that it's unlikely anyone would shoot themselves in the foot over this, so having mypy detect *args: *Ts by checking for Tuple[Starred[...]] in the AST doesn't seem too bad" - but keen to hear @gvanrossum's thoughts on this too.

@gvanrossum
Copy link
Member

I think it makes sense to introduce a new AST node type to represent the *Ts node. This is what we do in general.

@stroxler
Copy link
Contributor

stroxler commented Dec 3, 2021

Something else to consider here: I just realized that in the current implementation, the CPython unparser phase is guaranteed to create an explicit (*Ts,). That means that when using from __future__ import annotations with this feature, the string annotation we wind up with would have the explicit tuple.

It's not clear whether that actually matters (most libraries interpreting types at runtime would presumably wind up evaluating the string anyway) but it could definitely surprise people.

@zsol
Copy link

zsol commented Dec 3, 2021

I wonder if it might cause major issues for concrete syntax trees like LibCST to have an "invisible" tuple node with no printable characters (I'd have to ask @zsol)

I don't think this is very relevant to the typing discussion, but since I've been summoned: If the currently proposed implementation gets merged, LibCST's CST will diverge from the AST in this case and will probably introduce a new node type, like @gvanrossum suggests above. There's already precedent for divergence like this, but they tend to cause friction for users of the library.

@JelleZijlstra
Copy link
Member Author

Would we even need a new AST node? ast.Starred already exists and is an expr. We might just need a compiler change to handle ast.Starred and turn it into a tuple in the bytecode.

@stroxler
Copy link
Contributor

stroxler commented Dec 3, 2021

I agree with @JelleZijlstra, I'm pretty certain we don't need a new AST node unless we really want one. And I don't think we should add one if we can help it, using a bare Starred is more with the other unpack contexts (splatting into Callable params lists / splatting into slices) where we really do want the existing Starred type and its runtime semantics.

We could probably compile it into either the equivalent of (*foo,)[0] or the equivalent of next(iter(foo)), both of those options would avoid the need for a new builtin type. I don't think we'd want to leave the value wrapped in a tuple at runtime unless we have to - it would be much nicer if it evaluates to the actual unpacked type.

We might run into objections in code review if we do it that way, because we would have to make Starred compile universally into this, and then rely on the fact that preexisting unpack never directly compiles a Starred expression (which is already the case - the compiler crashes if it sees a Starred top-level expression, which is ruled out by the grammar but not by the Ast)

@gvanrossum
Copy link
Member

gvanrossum commented Dec 3, 2021 via email

@mrahtz
Copy link

mrahtz commented Dec 4, 2021

Alright, I'll start working on the grammar/AST/compiler changes to make this happen.

@mrahtz
Copy link

mrahtz commented Dec 4, 2021

Ok, so this turned out to be easier than I expected; I've got a first attempt done in mrahtz/cpython@d76039a.

Grammar: instead of

star_annotation[expr_ty]: ':' a=star_expression { _PyAST_Tuple(_PyPegen_singleton_seq(p, a), Load, EXTRA) }`

we just do

star_annotation[expr_ty]: ':' a=star_expression { a }

This is the only change needed compared to what's currently in the PEP.

Compiler: instead of changing the implementation for Starred_kind in the main visit table, we add a special case to the visit function for argument annotations emitting (what I hope should be) the bytecode for [*Ts][0].

The results are:

AST: print(ast.dump(ast.parse('def foo(*args: *Ts): pass'), indent=2)) gives:

Module(
  body=[
    FunctionDef(
      name='foo',
      args=arguments(
        posonlyargs=[],
        args=[],
        vararg=arg(
          arg='args',
          annotation=Starred(
            value=Name(id='Ts', ctx=Load()),
            ctx=Load())),
        kwonlyargs=[],
        kw_defaults=[],
        defaults=[]),
      body=[
        Pass()],
      decorator_list=[])],
  type_ignores=[])

As for __annotations__, the following code:

from typing import TypeVarTuple

Ts = TypeVarTuple('Ts')
def foo(*args: *Ts):
    pass

print(foo.__annotations__)

gives:

{'args': *Ts}

Does that fix everything we needed it to?

@JelleZijlstra
Copy link
Member Author

@mrahtz I'm fine with that, but it's different from the behavior specified in the PEP (https://www.python.org/dev/peps/pep-0646/#change-2-args-as-a-typevartuple), which says that *args: *Ts is equivalent to args: tuple(Ts), not *args: tuple(Ts)[0].

@stroxler
Copy link
Contributor

stroxler commented Dec 5, 2021

I think that fixes all the concerns. Agreed it requires updating the PEP language slightly.

The precise bytecode we generate is an implementation detail, it might be worth asking advice on python-dev. I would guess Ts.__iter__().__next__() is likely both fastest and the most direct indication of how the python-side code for an unpack annotation works: we're piggybacking on iterable semantics to make unpack work, tuples and lists are only incidentally related.

@mrahtz
Copy link

mrahtz commented Dec 8, 2021

@JelleZijlstra Indeed, this would require another change to the PEP.

@stroxler Alright, I've also been able to code up a next(iter(Ts)) version in mrahtz/cpython@a0fcdf8.

@pradeep90 @gvanrossum Any opinions on this - [*Ts][0] vs iter(next(Ts))? Guido, who'd be a good person to ask for a preliminary review on the compiler changes?

By the way @stroxler @JelleZijlstra - Pradeep and I were thinking that implementation development could actually get a bit messy with the work being spread throughout three separate places. I've started https://github.com/mrahtz/cpython/tree/pep646 as the canonical source for the implementation - grammar changes, compiler changes, typing.py changes, and other C changes to support list[*Ts] etc. Is there anything from the work you two have done so far that I can cherry-pick into there?

@JelleZijlstra
Copy link
Member Author

I haven't done any C work so far, only the draft typing-extensions implementation in python/typing#963 (which doesn't work yet in 3.6).

@stroxler
Copy link
Contributor

stroxler commented Dec 9, 2021

I haven't done any work either. I'm happy to help out if there's anything you'd like to delegate but I didn't want us to have clashing changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants