Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_py_abc breaks pattern matching #89646

Open
GalaxySnail mannequin opened this issue Oct 15, 2021 · 10 comments
Open

_py_abc breaks pattern matching #89646

GalaxySnail mannequin opened this issue Oct 15, 2021 · 10 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@GalaxySnail
Copy link
Mannequin

GalaxySnail mannequin commented Oct 15, 2021

BPO 45483
Nosy @rhettinger, @GalaxySnail

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-10-15.12:59:43.754>
labels = ['type-bug', '3.10', '3.11']
title = "pure Python class that has been registered as a `collections.abc.Sequence` can't be recgnized by the match statement without the `_abc` module"
updated_at = <Date 2021-10-15.15:30:45.212>
user = 'https://github.com/GalaxySnail'

bugs.python.org fields:

activity = <Date 2021-10-15.15:30:45.212>
actor = 'rhettinger'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2021-10-15.12:59:43.754>
creator = 'GalaxySnail'
dependencies = []
files = []
hgrepos = []
issue_num = 45483
keywords = []
message_count = 1.0
messages = ['404011']
nosy_count = 2.0
nosy_names = ['rhettinger', 'GalaxySnail']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue45483'
versions = ['Python 3.10', 'Python 3.11']

@GalaxySnail
Copy link
Mannequin Author

GalaxySnail mannequin commented Oct 15, 2021

Pure Python class that has been registered as a collections.abc.Sequence can't be recgnized by the match statement without the _abc module.

For example:

>>> from test.support.import_helper import import_fresh_module

>>> collections_abc_with_c_abc = import_fresh_module(
...     "collections.abc", fresh=["_collections_abc", "abc", "_abc"])

>>> class MyList:
...     def __init__(self, iterable):
...         self.list = list(iterable)
...     def __len__(self):
...         return len(self.list)
...     def __getitem__(self, item):
...         return self.list[item]
...
>>> collections_abc_with_c_abc.Sequence.register(MyList)
<class '__main__.MyList'>
>>> match MyList(range(3, 10)):
...     case [x, *_]:
...         print(x)
...     case _:
...         print("not a sequence")
...
3

>>> collections_abc_with_py_abc = import_fresh_module(
...     "collections.abc", fresh=["_collections_abc", "abc"], blocked=["_abc"])
>>> class MyList:
...     def __init__(self, iterable):
...         self.list = list(iterable)
...     def __len__(self):
...         return len(self.list)
...     def __getitem__(self, item):
...         return self.list[item]
...
>>> collections_abc_with_py_abc.Sequence.register(MyList)
<class '__main__.MyList'>
>>> match MyList(range(3, 10)):
...     case [x, *_]:
...         print(x)
...     case _:
...         print("not a sequence")
...
not a sequence

It seems to be caused by 069e81a , only tp_flags are checked in the MATCH_SEQUENCE opcode. Mapping has the same problem.

@GalaxySnail GalaxySnail mannequin added 3.10 only security fixes 3.11 only security fixes type-bug An unexpected behavior, bug, or error labels Oct 15, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@aganders3
Copy link

aganders3 commented Jul 5, 2022

This is also an issue for collections.abc.Mapping for the same reasons. I ran into this as part of some work on the PyO3 project where we're trying to figure out how best to check compatibility before safely casting to a PyMapping in PyO3/pyo3#2477.

I'm happy to help with a PR if there's a good idea on how to improve this.

@brandtbucher
Copy link
Member

I'm happy to help with a PR if there's a good idea on how to improve this.

The root of the issue is that the MATCH_MAPPING and MATCH_SEQUENCE opcodes are using tp_flags to detect Mapping/Sequence registration. This is significantly faster than using subclass checks against these ABCs, and doesn't require implicitly importing _collections_abc behind-the-scenes (and all of the nastiness that can result in). There's also additional thorniness (a handful of Sequences aren't treated that way by pattern matching, registering types in C extensions becomes faster/simpler, etc...) that makes this a cleaner solution overall.

While tp_flags is exposed in Pythonland as the __flags__ attribute on types, it's read-only. That's why we need to handle actually setting these flags in C-land, in _abc.c.

So how do we do this without _abc.c? I'm not entirely sure. I think making __flags__ writeable is overkill, and will likely create more problems than it solves. At the same time, I don't want to have the pattern matching implementation fall back on importing _collections_abc at runtime, since that's the whole problem we were trying to avoid in the first place.

I guess the problem boils down to: how can a pure-Python class set these flags correctly without some sort of C helper? Until we can answer that question, pattern matching can't work correctly with _py_abc.py.

Half-baked idea: maybe __flags__ could be writable for only these values? That could possibly replace the current __abc_tpflags__ hack too.

@brandtbucher brandtbucher changed the title pure Python class that has been registered as a collections.abc.Sequence can't be recgnized by the match statement without the _abc module _py_abc breaks pattern matching Jul 7, 2022
@brandtbucher brandtbucher added the 3.12 bugs and security fixes label Jul 7, 2022
@GalaxySnail
Copy link
Contributor

GalaxySnail commented Jul 7, 2022

Writable __flags__ is another way to register Sequences and Mappings, which doesn't follow the zen of Python: There should be one-- and preferably only one --obvious way to do it. It may break consistance.

Can we introduce 4 private functions in sys module?

sys._register_tpflags_mapping(tp: type) -> None
sys._register_tpflags_sequence(tp: type) -> None
sys._check_tpflags_mapping(tp: type) -> bool
sys._check_tpflags_sequence(tp: type) -> bool

We can explitly document them as private and not recommanded to use, or just not document them. They will be CPython implementation detail. These 4 functions will be called in _collections_abc.Sequence and _collections_abc.Mapping:

@classmethod
def register(cls, subclass):
    type(cls).register(cls, subclass)
    sys._register_tpflags_mapping(subclass)  # or sequence

@classmethod
def __subclasshook__(cls, subclass):
    if sys._check_tpflags_mapping(subclass):  # or sequence
        return True
    return NotImplemented

And the current __abc_tpflags__ hack can be removed.

@aganders3
Copy link

Apologies if this is a dumb question but under which circumstances is _abc.c not available? When/where is _py_abc.py actually used?

It seems to me just removing the fallback to the Python version would have the same effect, but I can only assume the Python implementation was left in for a reason (perhaps some platform I'm not familiar with).

@brandtbucher
Copy link
Member

Writable __flags__ is another way to register Sequences and Mappings, which doesn't follow the zen of Python: There should be one-- and preferably only one --obvious way to do it. It may break consistance.

The one and only one obvious way to do it would still be to register/inherit Sequence or Mapping. Writing to __flags__ would be the one and only one non-obvious way to do it, and what Sequence and Mapping do for you under-the-hood.

I'm usually not a big fan of using sys as a dumping ground for escape-hatches like this. If something isn't possible in Python, I don't see why sys would be a better home for it than _abc, right?

@brandtbucher
Copy link
Member

Apologies if this is a dumb question but under which circumstances is _abc.c not available? When/where is _py_abc.py actually used?

It seems to me just removing the fallback to the Python version would have the same effect, but I can only assume the Python implementation was left in for a reason (perhaps some platform I'm not familiar with).

I'm sort of wondering that too. I know that lots of these pure-Python copies are kept around for platforms or implementations that can't build or use the C accelerators for some reason. I'm curious if anybody out there is actually using _py_abc (rather than _abc) in practice...

@JelleZijlstra
Copy link
Member

Apologies if this is a dumb question but under which circumstances is _abc.c not available? When/where is _py_abc.py actually used?
It seems to me just removing the fallback to the Python version would have the same effect, but I can only assume the Python implementation was left in for a reason (perhaps some platform I'm not familiar with).

I'm sort of wondering that too. I know that lots of these pure-Python copies are kept around for platforms or implementations that can't build or use the C accelerators for some reason. I'm curious if anybody out there is actually using _py_abc (rather than _abc) in practice...

Maybe PyPy would use it.

@GalaxySnail
Copy link
Contributor

I'm usually not a big fan of using sys as a dumping ground for escape-hatches like this. If something isn't possible in Python, I don't see why sys would be a better home for it than _abc, right?

You are right. Polluting sys namespace is not a good idea here, it is abusing. I found that __flags__ for classes is undocumented (“private”) either, writable __flags__ is good enough for me if it's checked carefully that only Py_TPFLAGS_MAPPING and Py_TPFLAGS_SEQUENCE are allowed to be modified.

Apologies if this is a dumb question but under which circumstances is _abc.c not available? When/where is _py_abc.py actually used?

It seems to me just removing the fallback to the Python version would have the same effect, but I can only assume the Python implementation was left in for a reason (perhaps some platform I'm not familiar with).

I'm not sure either. IMO the reason may be PEP 399, which specifies that:

Starting in Python 3.3, any modules added to the standard library must have a pure Python implementation. This rule can only be ignored if the Python development team grants a special exemption for the module.

But it also says:

Re-implementing parts (or all) of a module in C (in the case of CPython) is still allowed for performance reasons, but any such accelerated code must pass the same test suite (sans VM- or C-specific tests) to verify semantics and prevent divergence. To accomplish this, the test suite for the module must have comprehensive coverage of the pure Python implementation before the acceleration code may be added.

Obviously _py_abc breaks pattern matching now, so it has already violated PEP 399.

@tiran
Copy link
Member

tiran commented Jul 8, 2022

The _abc module is a statically compiled core module and always available as a built-in module. I can only think of two circumstances in which the _abc module is not available: user builds a custom Python interpreter with _abc patched out or user code manipulates sys.modules to block the import of _abc. That's what import_fresh_module() is doing under the hood.

PyPy also provides a built-in _abc module.

$ pypy3.9 -c "import _abc; print(dir(_abc))"
['__doc__', '__loader__', '__name__', '__package__', '__spec__', '_abc_init', '_abc_instancecheck', '_abc_register', '_abc_subclasscheck', '_get_dump', '_reset_caches', '_reset_registry', 'get_cache_token']
$ python3 -c "import _abc; print(dir(_abc))"
['__doc__', '__loader__', '__name__', '__package__', '__spec__', '_abc_init', '_abc_instancecheck', '_abc_register', '_abc_subclasscheck', '_get_dump', '_reset_caches', '_reset_registry', 'get_cache_token']

I suggest that we do not add a workaround for this edge case and rather document that _abc C extension is mandatory.

@iritkatriel iritkatriel added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.13 bugs and security fixes and removed 3.10 only security fixes labels Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants