Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dict and dict.update treat the first argument as a mapping when it has attribute keys without attribute __getitem__ #116938

Closed
Prometheus3375 opened this issue Mar 17, 2024 · 8 comments
Labels
docs Documentation in the Doc dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@Prometheus3375
Copy link
Contributor

Prometheus3375 commented Mar 17, 2024

Bug report

How to reproduce

  1. Make such a class:
    class Object:
        def __iter__(self, /):
            return iter(())
    
        def keys(self, /):
            return ['1', '2']
  2. Call dict(Object()).
  3. Call d = {} and then d.update(Object()).

Expected result

At the step 2 an empty dictionary is returned.
At the step 3 d stays empty.

Actual result

Both steps 2 and 3 raise a TypeError 'Object' object is not subscriptable.

CPython versions tested on:

3.10
3.11

Operating systems tested on:

Windows 21H2 (19044.1645)
Ubuntu 20.04.6 LTS


Docs of dict state:

If a positional argument is given and it is a mapping object, a dictionary is created with the same key-value pairs as the mapping object. Otherwise, the positional argument must be an iterable object.

Unfortunately, there is no link to what is considered as a mapping object.
In typeshed both dict and dict.update accept SupportsKeysAndGetItem, i.e., any object with attributes keys and __getitem__.

But the experiment above shows that only keys is enough. While typeshed is a bit too restrictive in the case for iterable (only iterables of 2-sized tuples are allowed, but dict accepts any iterable of 2-sized iterables), I think just checking for keys is not enough.

In the actual C code there is such comment:

/* PyDict_Merge updates/merges from a mapping object (an object that
supports PyMapping_Keys() and PyObject_GetItem()). If override is true,
the last occurrence of a key wins, else the first. The Python
dict.update(other) is equivalent to PyDict_Merge(dict, other, 1).
*/

Thus, it is intended to check the presence of two attributes.


The error is here:

cpython/Objects/dictobject.c

Lines 3426 to 3441 in 2982bdb

/* Single-arg dict update; used by dict_update_common and operators. */
static int
dict_update_arg(PyObject *self, PyObject *arg)
{
if (PyDict_CheckExact(arg)) {
return PyDict_Merge(self, arg, 1);
}
int has_keys = PyObject_HasAttrWithError(arg, &_Py_ID(keys));
if (has_keys < 0) {
return -1;
}
if (has_keys) {
return PyDict_Merge(self, arg, 1);
}
return PyDict_MergeFromSeq2(self, arg, 1);
}

This code evaluates whether attribute keys is present. If the answer is true, calls PyDict_Merge, and calls PyDict_MergeFromSeq2 otherwise.

Linked PRs

@Prometheus3375 Prometheus3375 added the type-bug An unexpected behavior, bug, or error label Mar 17, 2024
@Prometheus3375 Prometheus3375 changed the title dict(object) and dict.update({}, object) treat object as mapping when it has at least keys attribute dict(object) and dict.update({}, object) treat object as mapping when it has at least attribute keys Mar 17, 2024
@Prometheus3375 Prometheus3375 changed the title dict(object) and dict.update({}, object) treat object as mapping when it has at least attribute keys dict and dict.update treat the first argument as a mapping when it has attribute keys without attribute __getitem__ Mar 17, 2024
@terryjreedy
Copy link
Member

terryjreedy commented Mar 18, 2024

Object is not a mapping, so the exception looks correct, not a bug.

But the experiment above shows that only keys is enough.
It shows that keys is not enough. For each key in keys, mapping[key] is accessed to get the value, and that failed.

there is no link to what is considered as a mapping object
This glossary entry points one here

I guess one is expected to know to look and find if one does not know what a mapping is. The first occurrence of mapping in the text could/should be linked to the glossary entry just as the iterable entry is. I'll make a PR tomorrow if no one beats me or persuades me to not do so.

@terryjreedy terryjreedy added the docs Documentation in the Doc dir label Mar 18, 2024
@Prometheus3375
Copy link
Contributor Author

Prometheus3375 commented Mar 18, 2024

Object is not a mapping, so the exception looks correct, not a bug.

Object is not a mapping, but it is still an iterable, so I expect dict and dict.update to use other method of initialization/update.

It shows that keys is not enough. For each key in keys, mapping[key] is accessed to get the value, and that failed.

It shows that the presence of keys is enough to consider an object being mapping and then use __getitem__ to get key-value pairs even if this attribute is not present. But the comment in C states that both attributes must be present. The same typeshed does with SupportsKeysAndGetItem protocol. The glossary states that mapping is an object which implements collections.abc.Mapping. It is clear, that class Object in my report is not a mapping, yet dict and dict.update consider otherwise.

@terryjreedy terryjreedy added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 18, 2024
@terryjreedy
Copy link
Member

Since a mapping is iterable, dict_update_arg(arg), whose code you quoted above, must first check whether arg seems to be a mapping. If arg has .keys, it assumes that it is meant to be a mapping and unconditionally calls and returns the merge function. So keys is enough for arg to be tried out as a mapping, but not enough for it to be successfully used as a mapping.

I believe you are claiming that either dict_update_arg should also check for __getitem__ before calling the merge function or the doc should be changed somehow. I am not going to decide which.

Class MyListWithKeys, which you edited out from the post above while I was writing this, is both a list and a mapping, and would pass a __getitem__ check. I presume you deleted after realizing this. However, it is a buggy mapping in that its keys and __getitem__ are dis-coordinated. Since it also raises TypeError, it illustrates why merely checking for a TypeError is insufficient for deciding to fallback to a sequence merge.

@Prometheus3375
Copy link
Contributor Author

Yes, I realized that class MyListWithKeys defines both keys and __getitem__ and in such case dict works as intended.

It can be fixed replacing list with set.

class MySetWithKeys(set):
    __slots__ = ()

    def keys(self, /):
        return list(map(repr, self))

my_set = MySetWithKeys([(1, 2), (3, 4)])

Raises TypeError upon calling dict(my_set): 'MySetWithKeys' object is not subscriptable.


A real world example of an iterable with keys and without __getitem__ is in Neo4j driver for Python. There is a class neo4j.Result which is an iterator with method keys. The method returns names of variables used in the result of the query.

For example, query returns an iterable of records of size 2 where the first index is occupied by ID and the second by the list of connected IDs:

query = (
    'unwind $source_ids as node_id '
    'match (:Node {nodeId: node_id})--(other:Node:Entity) '
    'return node_id, collect(other.nodeId) as ids'
    )

result: neo4j.Result = self.session.run(query, source_ids=list(source_ids))
return dict(list(result))

Initially I used dict(result) as neo4j.Result is not a mapping, but then I got the error and added list to avoid it.

@pochmann3
Copy link
Contributor

pochmann3 commented Mar 18, 2024

added list to avoid it

Just in case you don't like using a list: dict(iter(result)) perhaps also works, and dict(r for r in result) and dict(itertools.chain(result)) certainly do.

@rhettinger
Copy link
Contributor

I think this can be closed. The OP's description is the actual intended and tested behavior.

Here's the spec from collections.abc.MutableMapping.update():

def update(self, other=(), /, **kwds):
    ''' D.update([E, ]**F) -> None.  Update D from mapping/iterable E and F.
        If E present and has a .keys() method, does:     for k in E: D[k] = E[k]
        If E present and lacks .keys() method, does:     for (k, v) in E: D[k] = v
        In either case, this is followed by: for k, v in F.items(): D[k] = v
    '''
    if isinstance(other, Mapping):
        for key in other:
            self[key] = other[key]
    elif hasattr(other, "keys"):
        for key in other.keys():
            self[key] = other[key]
    else:
        for key, value in other:
            self[key] = value
    for key, value in kwds.items():
        self[key] = value

Note that the docstring for dict.update has the same specification:

>>> help(dict.update)
Help on method_descriptor:

update(...)
    D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
    If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
    If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
    In either case, this is followed by: for k in F:  D[k] = F[k]

The produces exactly the behavior the OP observed. If non-mapping has keys(), the update method with loop over those and attempt to use __getitem__ to fetch a value. If __getitem__ is not found, a TypeError is raised for a non-subscriptable argument.

So, it looks to me like the behavior the OP observed is exactly what was promised. This spec have been around a very long time and almost certainly there is code that relies on the behavior.

Since neo4j.Result isn't working with the protocol as designed, it should probably just document the iter(result) or list(result) workaround.

@hugovk hugovk added the pending The issue will be closed if no feedback is provided label Mar 23, 2024
@Prometheus3375
Copy link
Contributor Author

Alright, that evidence is persuasive. At this point I think the docs should be synchronized and fixed.

If E present and has a .keys() method, does: for k in E: D[k] = E[k]

There, for k in E: D[k] = E[k] should be replaced with for k in E.keys(): D[k] = E[k] as this is what actually happens. Phrase has a .keys() method should be changed to has a .keys attribute.

A similar note should be provided for dict constructor too, either in web documentation or in help message. Current docs say mapping which by glossary terms should implement collections.abc.Mapping but actually having keys is enough to proceed with for k in E.keys(): D[k] = E[k].


Nethertheless, I find a bit awkward the fact that the behavior depends on a single non-dunder method. keys may not be always a method after all. From now on I will be careful with any custom iterable/iterator to avoid usage of any member named keys unless the object is indeed a mapping.

For the code below mypy does not provide any warnings to dict(Object()). The revealed type is dict[str, int] - the same as I expected before opening this issue.

from collections.abc import Iterator, Iterable
from typing import reveal_type


class Object:
    def __iter__(self, /) -> Iterator[tuple[str, int]]: ...
    def keys(self, /) -> Iterable[str]: ...


d = dict(Object())
reveal_type(d)

Maybe adding next overload to dict and its update method would emit a warning. But I am afraid that this would cause any valid input to be also marked.

    @overload
    def __init__(self, iterable: SupportKeys, /) -> Never: ...

@Viicos
Copy link
Contributor

Viicos commented Oct 9, 2024

We got a report from a Pydantic user facing the same issue. We support dict(model) by implementing BaseModel.__iter__. Unfortunately, when users define a keys field, Python is going to assume it is a mapping as described in this issue, resulting in a TypeError because the keys attribute is not callable:

from pydantic import BaseModel

class Model(BaseModel):
    keys: str

dict(Model(keys="a"))
# TypeError: 'str' object is not callable

As OP said (I find a bit awkward the fact that the behavior depends on a single non-dunder method), this is indeed a source of confusion. But the backwards compatibility concerns seems valid as well.

Because dict(iter(...)) looks like an easy workaround, I'm willing to open a PR to change the documentation of dict() to (wording and markup TBD):

If no positional argument is given, an empty dictionary is created. If a positional argument is given and it defines a keys() method (attribute?), a dictionary is created by calling __getitem__ with every key. Otherwise, the positional argument must be an iterable object.

And as said in this thread, maybe update the docstring of dict.update/MutableMapping.update if we want to emphasize on the fact that keys is not necessarily a method.

Viicos added a commit to Viicos/cpython that referenced this issue Oct 9, 2024
…garding the positional argument they accept
AlexWaygood added a commit that referenced this issue Oct 11, 2024
…g the positional argument they accept (#125213)

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 11, 2024
…garding the positional argument they accept (pythonGH-125213)

(cherry picked from commit 21ac0a7)

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 11, 2024
…garding the positional argument they accept (pythonGH-125213)

(cherry picked from commit 21ac0a7)

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@picnixz picnixz removed the pending The issue will be closed if no feedback is provided label Oct 11, 2024
AlexWaygood added a commit that referenced this issue Oct 11, 2024
…egarding the positional argument they accept (GH-125213) (#125337)

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
AlexWaygood added a commit that referenced this issue Oct 11, 2024
…egarding the positional argument they accept (GH-125213) (#125336)

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

8 participants