Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table.upsert_all fails to write rows when not_null is present #538

Closed
xavdid opened this issue May 4, 2023 · 10 comments
Closed

table.upsert_all fails to write rows when not_null is present #538

xavdid opened this issue May 4, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@xavdid
Copy link

xavdid commented May 4, 2023

I found an odd bug today, where calls to table.upsert_all don't write rows if you include the not_null kwarg.

Repro Example

from sqlite_utils import Database

db = Database("upsert-test.db")

db["comments"].upsert_all(
    [{"id": 1, "name": "david"}],
    pk="id",
    not_null=["name"],
)

assert list(db["comments"].rows) # err!

The schema is correctly created:

CREATE TABLE [comments] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT NOT NULL
)

But no rows are created. Removing either the not_null kwargs works as expected, as does an insert_all call.

Version Info

  • Python: 3.11.0
  • sqlite-utils: 3.30
  • sqlite: 3.39.5 2022-10-14
@simonw simonw added the bug Something isn't working label May 8, 2023
@simonw
Copy link
Owner

simonw commented May 8, 2023

Confirmed - I added this test and it fails:

def test_upsert_all_not_null(fresh_db):
    # https://github.com/simonw/sqlite-utils/issues/538
    fresh_db["comments"].upsert_all(
        [{"id": 1, "name": "Cleo"}],
        pk="id",
        not_null=["name"],
    )
    assert list(fresh_db["comments"].rows) == [{"id": 1, "name": "Cleo"}]

@simonw
Copy link
Owner

simonw commented May 8, 2023

From time in the debugger, after creating the table it ends up doing this:

(Pdb) queries_and_params
[
  ('INSERT OR IGNORE INTO [comments]([id]) VALUES(?);', [1]),
  ('UPDATE [comments] SET [name] = ? WHERE [id] = ?', ['Cleo', 1])
]

@simonw
Copy link
Owner

simonw commented May 8, 2023

Here's the problem:

import sqlite3
db = sqlite3.connect(":memory:")
db.execute('create table foo (id integer primary key, name not null)')
db.execute('insert into foo (id) values (1)')

Produces:

IntegrityError: NOT NULL constraint failed: foo.name

But this:

db.execute('insert or ignore into foo (id) values (1)')

Completes without an exception.

@simonw
Copy link
Owner

simonw commented May 8, 2023

Here's the code at fault:

if upsert:
if isinstance(pk, str):
pks = [pk]
else:
pks = pk
self.last_pk = None
for record_values in values:
# TODO: make more efficient:
record = dict(zip(all_columns, record_values))
sql = "INSERT OR IGNORE INTO [{table}]({pks}) VALUES({pk_placeholders});".format(
table=self.name,
pks=", ".join(["[{}]".format(p) for p in pks]),
pk_placeholders=", ".join(["?" for p in pks]),
)
queries_and_params.append((sql, [record[col] for col in pks]))

@simonw
Copy link
Owner

simonw commented May 8, 2023

This feels like a fundamental flaw in the way upserts are implemented by sqlite-utils.

One fix would be to switch to using the UPSERT feature in SQLite: https://www.sqlite.org/lang_UPSERT.html

But...

UPSERT syntax was added to SQLite with version 3.24.0 (2018-06-04).

I still want to support SQLite versions earlier than that.

@simonw
Copy link
Owner

simonw commented May 8, 2023

I could detect if this happens using cursor.rowcount - not sure how I would recover from it though.

This would also require some major re-engineering, since currently it all works by generating a list of SQL queries in advance and applying them inside a loop in .insert_chunk():

def insert_chunk(
self,
alter,
extracts,
chunk,
all_columns,
hash_id,
hash_id_columns,
upsert,
pk,
conversions,
num_records_processed,
replace,
ignore,
):
queries_and_params = self.build_insert_queries_and_params(
extracts,
chunk,
all_columns,
hash_id,
hash_id_columns,
upsert,
pk,
conversions,
num_records_processed,
replace,
ignore,
)
with self.db.conn:
result = None
for query, params in queries_and_params:
try:
result = self.db.execute(query, params)
except OperationalError as e:
if alter and (" column" in e.args[0]):
# Attempt to add any missing columns, then try again
self.add_missing_columns(chunk)
result = self.db.execute(query, params)
elif e.args[0] == "too many SQL variables":

@simonw
Copy link
Owner

simonw commented May 8, 2023

How about if I had logic which checked that all not-null columns were provided in the call to upsert_all() - and if they were, modified the INSERT OR IGNORE INTO to include a placeholder value for those columns that would then be fixed by the later UPDATE?

Something like this:

[
  ('INSERT OR IGNORE INTO [comments]([id], name) VALUES(?, ?);', [1, '']),
  ('UPDATE [comments] SET [name] = ? WHERE [id] = ?', ['Cleo', 1])
]

@simonw
Copy link
Owner

simonw commented May 8, 2023

That fix seems to work!

@simonw simonw closed this as completed May 8, 2023
@xavdid
Copy link
Author

xavdid commented May 8, 2023

perfect, thank you!

@hbmartin
Copy link

hbmartin commented Jan 4, 2024

Hello, I am experiencing a similar issue:

Python version: 3.9.6
sqlite version: (3, 39, 5)
sqlite-utils version: 3.36

I create a table as:

        db["feeds"].create(
            {"overcastId": int, "title": str, "text": str, "subscribed": bool, "overcastAddedDate": datetime.datetime, "notifications": bool, "xmlUrl": str, "htmlUrl": str},
            pk="overcastId",
            not_null={"overcastId", "title", "xmlUrl"},
        )

Resulting in the schema:

CREATE TABLE [feeds] (
   [overcastId] INTEGER PRIMARY KEY NOT NULL,
   [title] TEXT NOT NULL,
   [text] TEXT,
   [subscribed] INTEGER,
   [overcastAddedDate] TEXT,
   [notifications] INTEGER,
   [xmlUrl] TEXT NOT NULL,
   [htmlUrl] TEXT
);

When I try to upsert a dict that looks like {'overcastId': 56, 'text': 'HBR IdeaCast', 'title': 'HBR IdeaCast', 'xmlUrl': 'http://feeds.harvardbusiness.org/harvardbusiness/ideacast', 'htmlUrl': 'https://hbr.org/podcasts/ideacast', 'overcastAddedDate': '2018-02-15T14:19:58+00:00', 'subscribed': True, 'notifications': False}
then I get an error sqlite3.IntegrityError: datatype mismatch when I include in the upsert call not_null={"overcastId", "title", "xmlUrl"}

However, if I try upserting without not_null then I get no error BUT also no rows are inserted 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants