Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_TemporaryFileWrapper[bytes] somehow isn't a BinaryIO #7843

Closed
adamnovak opened this issue May 16, 2022 · 4 comments
Closed

_TemporaryFileWrapper[bytes] somehow isn't a BinaryIO #7843

adamnovak opened this issue May 16, 2022 · 4 comments

Comments

@adamnovak
Copy link

adamnovak commented May 16, 2022

With this test file repro.py:

import tempfile
from typing import BinaryIO

my_var: BinaryIO = tempfile.NamedTemporaryFile()

I get this error from mypy repro.py:

repro.py:4:20: error: Incompatible types in assignment (expression has type "_TemporaryFileWrapper[bytes]", variable has type "BinaryIO")  [assignment]
    my_var: BinaryIO = tempfile.NamedTemporaryFile()
                       ^
Found 1 error in 1 file (checked 1 source file)

This is with mypy version 0.941.

The NamedTemporaryFile() docs say that it "operates exactly as TemporaryFile() does", save for always having a name, and TemporaryFile() is documented to return a file-like object and to default to binary mode.

If BinaryIO is meant to represent a binary-mode file-like object, then there's something wrong with the type annotations if it's not able to match the return value of NamedTemporaryFile().

@srittau
Copy link
Collaborator

srittau commented May 17, 2022

IO and its subclasses BinaryIO and TextIO predate the introduction of protocols and are considered legacy by the typeshed team. Not every I/O class is going to subclass them and they shouldn't be used in argument types. In this particular case, it should be able to annotate the variable as IO[bytes], though.

@srittau srittau closed this as completed May 17, 2022
@adamnovak
Copy link
Author

adamnovak commented May 17, 2022

IO[bytes] does indeed work, which is quite surprising given that this is how it and BinaryIO are documented in the manual for typing:

class typing.IO
class typing.TextIO
class typing.BinaryIO

Generic type IO[AnyStr] and its subclasses TextIO(IO[str]) and BinaryIO(IO[bytes]) represent the types of I/O streams such as returned by open().

Deprecated since version 3.8, will be removed in version 3.12: The typing.io namespace is deprecated and will be removed. These types should be directly imported from typing instead.

The typing documentation suggests that these ought to be more or less interchangeable and that if you want a stream these are the types to use. Is it maintained by different people who disagree with the typeshed/mypy developers? Or should it perhaps be updated to say that these are base classes for implementations and inappropriate for use in type signatures?

@srittau If neither IO[bytes] nor BinaryIO is actually the Right Way to annotate an argument or variable as holding a file-like object, what is the currently recommended way to say "file-like object"?

Some people think the answer is to use a custom Protocol that lists all the methods my particular idea of "file-like object" needs to have (e.g. it might or might not need seek(), it might be read-only or write-only, etc.).

But, I set out to add type annotations to my application, not design an application- or library-specific notion of what a stream should be. So presumably I ought to use some standardized pre-defined Protocols unless I have a good reason not to, or else use the Protocols from one or more of the libraries I am using that deal in streams.

I saw some references to pre-defined protocols in _typeshed for streams, but:

  • These I think are only available when TYPE_CHECKING, which makes them a little awkward to use as argument and return types since I need to quote them everywhere.
  • The _typeshed module is meant to be internal to Typeshed and shouldn't be used by user code, right?
  • There aren't any full stream types. There's a SupportsRead and a SupportsNoArgReadline. If I wanted to call both read() and readline() I'd need to combine these into something like:
class SupportsReadAndNoArgReadline(SupportsRead[bytes], SupportsNoArgReadline[bytes], Protocol):
    pass

And then I'm back to inventing an application-specific notion of what a good stream interface should be, although at least I don't have to

If I instead want to get the Protocols used by a library I use, it looks like to do that I'd have to do something like:

from yaml.emitter import Emitter
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from yaml.emitter import _WriteStream

def make_emitter(stream: "_WriteStream[bytes]") -> Emitter:
    return Emitter(stream)

That Protocol is used here:

def __init__(
self, stream: _WriteStream[Any], canonical=..., indent=..., width=..., allow_unicode=..., line_break=...
) -> None: ...

Again I find myself working with underscore-prefixed stuff that it seems like I shouldn't be touching.

Are either of these what I'm supposed to be doing?

@Akuli
Copy link
Collaborator

Akuli commented May 17, 2022

I just use IO[bytes] in application code. It's good enough for applications while also being simple to use: no boilerplate of defining custom protocols, no underscore-prefixed things, no confusion for people not familiar with typing details, just a file-like object.

Protocols are mostly useful in library code where you want to be specific about what works and what doesn't work. For example, if your library needs .seek(), you don't want users to pass in a http.client.HttpResponse.

@JelleZijlstra
Copy link
Member

IO[bytes] does indeed work, which is quite surprising given that this is how it and BinaryIO are documented in the manual for typing:

class typing.IO
class typing.TextIO
class typing.BinaryIO
Generic type IO[AnyStr] and its subclasses TextIO(IO[str]) and BinaryIO(IO[bytes]) represent the types of I/O streams such as returned by open().
Deprecated since version 3.8, will be removed in version 3.12: The typing.io namespace is deprecated and will be removed. These types should be directly imported from typing instead.

The typing documentation suggests that these ought to be more or less interchangeable and that if you want a stream these are the types to use. Is it maintained by different people who disagree with the typeshed/mypy developers? Or should it perhaps be updated to say that these are base classes for implementations and inappropriate for use in type signatures?

We're working on that: python/cpython#92878.

python/typing#829 should also be a useful read.

@srittau If neither IO[bytes] nor BinaryIO is actually the Right Way to annotate an argument or variable as holding a file-like object, what is the currently recommended way to say "file-like object"?

typing.IO and friends are the recommended way to say "file-like object". The trouble is that "file-like object" isn't a well-defined term. IO defines 20 or so methods, but most applications won't need all of these, and some objects that you might think of as file-like won't have all of them.

Some people think the answer is to use a custom Protocol that lists all the methods my particular idea of "file-like object" needs to have (e.g. it might or might not need seek(), it might be read-only or write-only, etc.).

But, I set out to add type annotations to my application, not design an application- or library-specific notion of what a stream should be. So presumably I ought to use some standardized pre-defined Protocols unless I have a good reason not to, or else use the Protocols from one or more of the libraries I am using that deal in streams.

I saw some references to pre-defined protocols in _typeshed for streams, but:

  • These I think are only available when TYPE_CHECKING, which makes them a little awkward to use as argument and return types since I need to quote them everywhere.
  • The _typeshed module is meant to be internal to Typeshed and shouldn't be used by user code, right?
  • There aren't any full stream types. There's a SupportsRead and a SupportsNoArgReadline. If I wanted to call both read() and readline() I'd need to combine these into something like:
class SupportsReadAndNoArgReadline(SupportsRead[bytes], SupportsNoArgReadline[bytes], Protocol):
    pass

We'll probably expose more of these protocols in typing-extensions in the future so that they're easier to use outside of typeshed.

And then I'm back to inventing an application-specific notion of what a good stream interface should be, although at least I don't have to

If I instead want to get the Protocols used by a library I use, it looks like to do that I'd have to do something like:

from yaml.emitter import Emitter
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from yaml.emitter import _WriteStream

def make_emitter(stream: "_WriteStream[bytes]") -> Emitter:
    return Emitter(stream)

That Protocol is used here:

def __init__(
self, stream: _WriteStream[Any], canonical=..., indent=..., width=..., allow_unicode=..., line_break=...
) -> None: ...

Again I find myself working with underscore-prefixed stuff that it seems like I shouldn't be touching.

Are either of these what I'm supposed to be doing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants