Skip to content

Document when to use Union[str, unicode] vs AnyStr #871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JukkaL opened this issue Jan 27, 2017 · 4 comments
Closed

Document when to use Union[str, unicode] vs AnyStr #871

JukkaL opened this issue Jan 27, 2017 · 4 comments

Comments

@JukkaL
Copy link
Contributor

JukkaL commented Jan 27, 2017

This comes up pretty often, and it would be useful to have some documentation in the CONTRIBUTING.md file (or at least a link to this documentation).

Here's a starting point for what to document (there probably are other things that we should mention):

  • If a function accepts both str and ascii-only unicode arguments, usually the best type to use is Union[str, unicode] (or Union[str, Text] in a 2and3 stub).
  • Use AnyStr if you have two or more types in a signature that must agree on whether they are str or unicode. (It would also be nice to give an example where this is important.)
  • You can also use AnyStr in invariant positions in generic type arguments. For example, List[AnyStr] is generally better than List[Union[str, unicode]] (also explain why). However, often it's even better to use a covariant type such as Iterable or Sequence. In that case the union variant is preferable if the container may contain a mix of str and unicode. For example, Iterable[Union[str, unicode]] is fine if the iterable may contain a mix of str and unicode values.
  • Try to avoid using Union[str, unicode] in a return type, since it means that every call site will have to deal with both str and unicode values. It may be fine to use this if the return type is sufficiently unpredictable.
  • Similarly, try to avoid using Union[str, unicode] as an attribute type -- again code using this attribute would have to deal with both str and unicode values.
@lincolnq
Copy link
Contributor

lincolnq commented Feb 10, 2017

(I'm new to this project so please correct me if I've asked this question in the wrong place.)

The xml.etree module documents the return type of tostring() in python 3 as either str (if the value of the encoding parameter is "unicode") or bytes otherwise. (https://docs.python.org/3/library/xml.etree.elementtree.html)

However, the tostring stub documents it returning simply str.

    def tostring(element: Element, encoding: str=..., method: str=..., *, 
        short_empty_elements: bool=...) -> str: ...

This is a blocker to usage in python 3 since the return type is just wrong for many use cases. Should this stubfile be changed to return AnyStr, or a Union, or something else entirely? (Am happy to submit a PR to do this but I'm not sure which you would do.)

@JukkaL
Copy link
Contributor Author

JukkaL commented Feb 10, 2017

@lincolnq In cases like these typeshed typically uses an Any return type. The return type depends on the value of an argument, and PEP 484 isn't expressive enough to represent this.

@JelleZijlstra
Copy link
Member

I started writing documentation to address this issue; what I have so far is at master...JelleZijlstra:patch-10 (heavily based on Jukka's comments here).

However, I don't understand what Jukka is referring to in his paragraph about type arguments: "For example, List[AnyStr] is generally better than List[Union[str, unicode]]". I would think that you use the former if you want a list that contains either str or unicode but not both, and the latter if you can accept a mixed list. But I don't see how that relates to invariance versus covariance. @JukkaL can you clarify?

@srittau
Copy link
Collaborator

srittau commented Sep 17, 2020

I think we have sufficient documentation when to use str, bytes, and Text now.

@srittau srittau closed this as completed Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants