Skip to content

sqlite3.Row behaves like a dictionary but doesn't handle "in" as one might expect #100450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vsajip opened this issue Dec 23, 2022 · 21 comments
Closed
Labels
topic-sqlite3 type-feature A feature request or enhancement

Comments

@vsajip
Copy link
Member

vsajip commented Dec 23, 2022

The sqlite3.Row class is supposed to give access to row results by name as well as index. So in some respects it behaves like a dictionary (it even has a keys() method) but it doesn't appear to handle the in operator as one might expect.

The following minimal script illustrates:

import sqlite3

CREATION = 'CREATE TABLE lookup (name TEXT NOT NULL, value TEXT NOT NULL)'
INSERTION = 'INSERT INTO lookup (name, value) VALUES(?, ?)'
SELECTION = 'SELECT name, value from lookup'

db = sqlite3.connect(':memory:')
cur = db.cursor()
cur.execute(CREATION)
cur.execute(INSERTION, ('foo', 'bar'))
db.commit()
cur.row_factory = sqlite3.Row
cur.execute(SELECTION)
rows = cur.fetchall()
row = rows[0]
print(f'{row.keys()=}')
for k in row.keys():
    print(f'{k=}')
    print(f'{k in row=}')
    print(f'{row[k]=}')

prints

row.keys()=['name', 'value']
k='name'
k in row=False
row[k]='foo'
k='value'
k in row=False
row[k]='bar'

One would reasonably expect k in row to be True.

Tested on Python 3.10.0, Ubuntu 20.04, but not believed to be restricted to those.

Linked PRs

@vsajip vsajip added the type-bug An unexpected behavior, bug, or error label Dec 23, 2022
sobolevn added a commit to sobolevn/cpython that referenced this issue Dec 23, 2022
@pochmann
Copy link
Contributor

pochmann commented Dec 23, 2022

I'm no expert, but it looks reasonable/intentional to me. What are the members of a row? I'd say the values, not the keys. The values are also what you get when you iterate a row.

Yes, this prints Falses:

for x in row.keys():
    print(x in row)

But this prints Trues:

for x in row:
    print(x in row)

And that's what you should expect.

@sobolevn Does the latter still work with your __contains__ method? To me, it looks like you're half-changing membership from values to keys ("half" because you're not changing what iterating the row gives). If I'm not mistaken, in then no longer finds values, and instead finds keys. Both seem like breaking changes to me. Existing programs already using in to check for values would get false negatives and false positives.

@vsajip
Copy link
Member Author

vsajip commented Dec 23, 2022

I'm no expert, but it looks reasonable/intentional to me ... The values are also what you get when you iterate a row

But the Row object appears intended to make the result look more like a dict - it's there to provide key-based as well as index-based access. From that perspective, the result I got is surprising to me, and perhaps to others. If that's not the intention, why provide a keys() method and let row[key] work like a dict does?

@sobolevn
Copy link
Member

sobolevn commented Dec 23, 2022

I also think that checking values in mapping-like-structure with in is very counter-intuitive.
Also, I don't really think that there are valid use-cases for it.

But, checking what fields are there - looks like a valid task. Because fields might be conditional.

@vsajip
Copy link
Member Author

vsajip commented Dec 23, 2022

I also think that checking values in mapping-like-structure with in is very counter-intuitive.

Do you mean intuituve? Why counter-intuitive? If you do an Internet search for "python if key in dict", you see lots of results like this:

https://stackoverflow.com/questions/1602934/check-if-a-given-key-already-exists-in-a-dictionary
https://datagy.io/python-check-if-key-value-dictionary/
https://www.askpython.com/python/dictionary/check-if-key-exists-in-a-python-dictionary

and of course in the official docs

https://docs.python.org/3/tutorial/datastructures.html#dictionaries

all of which suggest using 'key in dict' where a dict is, of course, a mapping-like structure as you describe it.

Also, I don't really think that there are valid use-cases for it.

Well, I raised the issue because I came across a use case for it: in a UI, I have a list of attributes associated with a row which are either keys in the row, or named functions to call with the row as argument to compute values from the row. So I was trying to do something along the lines of

if key in row:
    value = row[key]
else:
    value = function_map[key](row)

Obviously, I was able for my use case to switch things around:

if key in function_map:
    value = function_map[key](row)
else:
    value = row[key]

But that just underscores that using in to check if something is in a mapping structure is a very common idiom.

@sobolevn
Copy link
Member

@vsajip sorry, I might used some wrong wording. I completely agree with you. Just to be clear:

  • Looking for keys in dict-like objects with in is a standard thing
  • Looking for values in dict-like objects with in is not common and is confusing

🙂

@erlend-aasland
Copy link
Contributor

The issues archive hints that there are not very many use cases for this feature; I could not find a single issue, other than this :)

I of course agree that using in to check for the existence of something in a mapping is a common idiom, but I suspect it's not an idiom for the common database programmer; normally, the developer already knows which column names (keys) to look for, since they probably wrote the SQL queries themselves. I guess this is why few people have noticed this, and also why there are few use-cases for this.

By default, a cursor will return a resulting row as a tuple; using value in row in this context, will check for the existence of column values. I'm not sure that fact should affect how sqlite3.Row should behave, though.

BTW, if you need an actual dict row factory, there is a recently added recipe for that in the docs.

@vsajip
Copy link
Member Author

vsajip commented Dec 24, 2022

I could not find a single issue, other than this :)

There's always a first time any genuine issue is reported 🙂 I assume the missing __contains__ is just an oversight, I've certainly overlooked stuff like this in my code, more than once.

I've used SQLite before through various wrappers/ORMs like SQLAlchemy, Peewee, Django etc. and so never had to use sqlite3.Row directly. I don't claim that my use case is common, just that it's not unreasonable.

I didn't try the dict_factory, but using namedtuple_factory seemed to make things slower than sqlite3.Row in my application (I didn't dig into why).

@pochmann
Copy link
Contributor

pochmann commented Dec 24, 2022

To expand on my thoughts on the proposed change of row membership testing from checking row values to instead checking column names:

Container invariant

Simple concept: Everything in the row ... should be in the row. In Python: all(x in row for x in row). Or as @rhettinger wrote (good read):

for a in container:
    assert a in container    # this should ALWAYS be true

The proposed change would violate that. And if you're thinking of also changing iteration to give column names instead of row values: iteration giving row values really can't be deemed not a valid use-case, as that's even suggested in the guides.

Backwards incompatibility

Being a backwards incompatible change possibly breaking existing programs, I think PEP 387 applies. So the change "should have a large benefit to breakage ratio", which has not been demonstrated. The "large benefit" shown so far is being able to write in row instead of in row.keys(). And the amount of breakage is unclear. If I'm not mistaken, the PR also would need to introduce a deprecation period/warnings and doesn't yet.

Nature of the data

Relational database (query) rows are conceptually different from general-purpose dicts.

Note I keep saying "column names", not "keys" (anymore). Even the Row.keys() documentation says "Return a list of column names", not "keys". That's what they are: Column names. They're not really row data, they're column data. For rows, they're metadata. They're not really in the row but rather "above" the row. If you think of how a relational database table looks like, do you not think of a grid with one header row showing the column names and then data rows only containing the actual data? Similar to how you might think of a list with indexes written above the elements. And for lists, membership tests also check elements, not indexes.

The Row.keys() method provides the column names apparently for convenience if someone wants them (didn't even exist originally, was added in a later Python version), but fundamentally they belong to the columns. And as the keys() documentation also says, you can also get them from the cursor. And as @erlend-aasland said, normally you know them already since you put them into the query.

For dicts, keys really are data, and in the dict. For example you might use a dict to count how often each word occurs in a text. Then the keys are the words, absolutely data. They come from the input data (the text). Not from the author of the program. For a relational database on the other hand, the column names are normally defined once by whoever created the table, before entering any data.

@vsajip
Copy link
Member Author

vsajip commented Dec 24, 2022

The whole point of Row is to be able to access row values by name, right? It's not unreasonable to want to know if a particular name is in the row. I don't mind if the solution is to add has_key(name) to Row rather than support in. And yes, I know I could call keys() and look in the result, but then you could maintain a separate list of names and not use Row anyway. Row is a convenience, and it could be just that little bit more convenient. And if Row is to look a bit like a dict but not support in analogously to dict, this should at least be mentioned in the docs for Row, in a warning admonition.

@pochmann
Copy link
Contributor

@vsajip I'd say I agree with most of that :-). Just shouldn't break the in behavior, and I'm not convinced about the warning. In my opinion, you shouldn't have expected it to check column names.

@pochmann
Copy link
Contributor

@sobolevn What did you mean with "conditional" fields?

@sobolevn
Copy link
Member

I haven't used Row that much, but in django it is quite common to use .only(*field_names) to make your queryset smaller and query faster. https://docs.djangoproject.com/en/4.1/ref/models/querysets/#django.db.models.query.QuerySet.only

So, I guess the same rule might be applied here, these two cases will create different row structures:

  • cur.execute('select id from users;')
  • cur.execute('select * from users;')

For some cases, we might be interested in some specific fields and their presence: like 'email' to show it some kind of repr, or we might look for 'password' to remove it from logs.

But, I don't have strong feelings about either decision. I see pros and cons in both of them.

@pochmann
Copy link
Contributor

Hmm I don't quite get it. With select id from users, you know you don't get an email column. And with select * from users, you know you get it iff it's a column in the table (which you ought to know, and which likely won't change all the time).

@vsajip
Copy link
Member Author

vsajip commented Dec 25, 2022

You're assuming the person who wrote the query is the same person that's using the results, or it's in the same context. That's not always the case. A query might be generated dynamically (with differing result columns each time) and hand off results to be processed by some other module that didn't generate the query.

you shouldn't have expected it to check column names.

If it looks somewhat like a duck and quacks somewhat like a duck ...

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Dec 29, 2022

@vsajip:

I could not find a single issue, other than this :)

There's always a first time any genuine issue is reported 🙂 I assume the missing __contains__ is just an oversight, I've certainly overlooked stuff like this in my code, more than once.

That's certainly possible :) It might also be a deliberate decision.

I've used SQLite before through various wrappers/ORMs like SQLAlchemy, Peewee, Django etc. and so never had to use sqlite3.Row directly. I don't claim that my use case is common, just that it's not unreasonable.

Not at all; it is not unreasonable to question why sqlite3.Row does not has __contains__. I wonder why myself 😄

Putting my conservative hat on, I'd say that Hyrum's Law applies.

Also, consider the case where existing code is not using a row factory (query results are returned simply as tuples); now consider what would happen if that code base starts using the sqlite3.Row row factory: Currently (without __contains__), any value in row in that code base will generate an exception; with this change, such code could become silent and possibly hard-to-catch bugs. That may be a contrived example, but it will surely hit someone out there. We should also keep such cases in mind when considering changing the semantics.

I didn't try the dict_factory, but using namedtuple_factory seemed to make things slower than sqlite3.Row in my application (I didn't dig into why).

Try extracting the class generator as a helper function, and decorate that helper with functools.lru_cache (there's no need to create multiple named tuples with the same signature).

@pochmann

This comment was marked as outdated.

@erlend-aasland

This comment was marked as outdated.

@pochmann
Copy link
Contributor

pochmann commented Dec 29, 2022

Hyrum's law still applies, though.

I'm not sure about that. It's not just an implementation detail, which I think Hyrum's law is about.

The Row documentation says:

It supports iteration, equality testing, len(), and mapping access by column name and index.

So it does mention iteration and doesn't mention membership testing, which means membership testing is done by iteration, as that's its documented fallback when there's no explicit support via __contains__.

It doesn't say what data the iteration yields, but the documentation's dict_factory recipe shows that it's the row values.

So membership testing in a Row checking row values is not just "observable behavior" as Hyrum put it. It's documented.

Edit: somewhat nevermind, I forgot that the row argument of dict_factory isn't a Row object. Maybe the documentation still says somewhere what iterating a Row object yields. If it doesn't, then, well, it yielding row values might only be "observable behavior". But since iteration support is mentioned, I'd say it's at least "intended" to be observed and used.

@serhiy-storchaka
Copy link
Member

I concur with @pochmann.

If we want to go such far with breaking compatibility and turning sqlite3.Row into full-fledged mapping object, we should first deprecate returning tuple as a result, and deprecate iteration of sqlite3.Row.

If you want to check if the key is contained in the sqlite3.Row, use .keys(). BTW, there are no tests for .keys().

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Aug 28, 2023

If we want to go such far with breaking compatibility and turning sqlite3.Row into full-fledged mapping object, we should first deprecate returning tuple as a result, and deprecate iteration of sqlite3.Row.

Those deprecations would break pretty much all sqlite3 code. I don't think that's a path we would like to follow.

If you want to check if the key is contained in the sqlite3.Row, use .keys(). BTW, there are no tests for .keys().

That is easily fixable. (See #108558)

@erlend-aasland erlend-aasland added pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Aug 28, 2023
@erlend-aasland erlend-aasland moved this to Backwards compatibility issues in sqlite3 issues Aug 28, 2023
@erlend-aasland
Copy link
Contributor

I'm closing this as per feedback on Discourse.

@erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2023
@github-project-automation github-project-automation bot moved this from Backwards compatibility issues to Done in sqlite3 issues Aug 29, 2023
@erlend-aasland erlend-aasland moved this from Done to Discarded in sqlite3 issues Aug 29, 2023
@erlend-aasland erlend-aasland removed the pending The issue will be closed if no feedback is provided label Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-sqlite3 type-feature A feature request or enhancement
Projects
Status: Discarded
Development

No branches or pull requests

5 participants