Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very weird flaky test_create_table_ignore_replace and test_upsert tests #2356

Open
simonw opened this issue Jun 13, 2024 · 18 comments
Open

Very weird flaky test_create_table_ignore_replace and test_upsert tests #2356

simonw opened this issue Jun 13, 2024 · 18 comments

Comments

@simonw
Copy link
Owner

simonw commented Jun 13, 2024

I think these are failures just against SQLite: 3.25.3 (the earliest we suport, refs #2349) - update this note if we see them on alternative versions.

https://github.com/asg017/datasette/actions/runs/9503518697/job/26194122396

platform linux -- Python 3.11.9, pytest-8.2.2, pluggy-1.5.0
SQLite: 3.25.3
    async def test_create_table_ignore_replace(ds_write, input, expected_rows_after):
        # Create table with two rows
        token = write_token(ds_write)
        first_response = await ds_write.client.post(
            "/data/-/create",
            json={
                "rows": [{"id": 1, "name": "Row 1"}, {"id": 2, "name": "Row 2"}],
                "table": "test_insert_replace",
                "pk": "id",
            },
            headers=_headers(token),
        )
        assert first_response.status_code == 201
    
        ds_write._tracked_events = []
    
        # Try a second time
        second_response = await ds_write.client.post(
            "/data/-/create",
            json=input,
            headers=_headers(token),
        )
        assert second_response.status_code == 201
        # Check that the rows are as expected
        rows = await ds_write.client.get("/data/test_insert_replace.json?_shape=array")
>       assert rows.json() == expected_rows_after
E       assert {'error': 'ne...'Invalid SQL'} == [{'id': 1, 'n... 'Row 3 new'}]
E         
E         Full diff:
E         + {
E         +     'error': 'near "from": syntax error',
E         +     'ok': False,
E         +     'status': 400,
E         +     'title': 'Invalid SQL',
E         + }
E         - [
E         -     {
E         -         'id': 1,
E         -         'name': 'Row 1',
E         -     },
E         -     {
E         -         'id': 2,
E         -         'name': 'Row 2',
E         -     },
E         -     {
E         -         'id': 3,
E         -         'name': 'Row 3 new',
E         -     },
E         - ]

tests/test_api_write.py:1424: AssertionError
----------------------------- Captured stderr call -----------------------------
ERROR: conn=<sqlite3.Connection object at 0x7ff0667a36a0>, sql = 'select rowid,  from test_insert_replace order by rowid limit 101', params = {}: near "from": syntax error
=========================== short test summary info ============================
FAILED tests/test_api_write.py::test_create_table_ignore_replace[input0-expected_rows_after0] - assert {'error': 'ne...'Invalid SQL'} == [{'id': 1, 'n... 'Row 3 new'}]
  
  Full diff:
  + {
  +     'error': 'near "from": syntax error',
  +     'ok': False,
  +     'status': 400,
  +     'title': 'Invalid SQL',
  + }
@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

This is SO weird:

select rowid, from test_insert_replace

Why would it EVER generate that SQL query? Especially only intermittently?

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

One of those failures was SQLite: 3.25.3 - not sure if all failures are against that version.

simonw added a commit that referenced this issue Jun 13, 2024
@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

It would be fantastic if we could figure out a way to reliably replicate this failure in dev, so we can pdb it.

@simonw simonw changed the title Very weird flaky test_create_table_ignore_replace test Very weird flaky test_create_table_ignore_replace and test_upsert tests Jun 13, 2024
@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

Uh oh... hit a different flaky write test, presumably for the same underlying reason?

Also SQLite 3.25.3

platform linux -- Python 3.10.14, pytest-8.2.2, pluggy-1.5.0
SQLite: 3.25.3

https://github.com/simonw/datasette/actions/runs/9503858661/job/26195236184

    async def test_upsert(ds_write, initial, input, expected_rows, should_return):
        token = write_token(ds_write)
        # Insert initial data
        initial["table"] = "upsert_test"
        create_response = await ds_write.client.post(
            "/data/-/create",
            json=initial,
            headers=_headers(token),
        )
        assert create_response.status_code == 201
        if should_return:
            input["return"] = True
        response = await ds_write.client.post(
            "/data/upsert_test/-/upsert",
            json=input,
            headers=_headers(token),
        )
>       assert response.status_code == 200
E       assert 400 == 200
E        +  where 400 = <Response [400 Bad Request]>.status_code

tests/test_api_write.py:540: AssertionError
=========================== short test summary info ============================
FAILED tests/test_api_write.py::test_upsert[False-initial3-input3-expected_rows3] - assert 400 == 200
 +  where 400 = <Response [400 Bad Request]>.status_code
======= 1 failed, 1425 passed, 9 skipped, 1 xfailed in 121.93s (0:02:01) =======

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

That change did indeed help further understand. the problem: https://github.com/simonw/datasette/actions/runs/9503993969/job/26195656683

FAILED tests/test_api_write.py::test_upsert[True-initial1-input1-expected_rows1] - AssertionError: {"ok": false, "errors": ["Row 0 is missing primary key column(s): \"rowid\""]}
assert 400 == 200

Row 0 is missing primary key column(s): "rowid"

So it looks like the test_upsert failure is the exact same underlying problem as the test_create_table_ignore_replace failure.

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

I'm very suspicious that there might be some intermittent SQLite bug present in 3.25.3 involving rowid columns.

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

The SQLite changelog, tags stripped, is about 77,000 tokens:

curl 'https://sqlite.org/changes.html' | strip-tags | ttok

So I ran it through Claude Haiku:

curl 'https://sqlite.org/changes.html' | strip-tags | llm -m haiku \
  --system 'identify any fixed bugs that could impact how rowid column selects work, and quote those lines along with the SQLite version number'

Based on the release history, I was able to identify the following fixed bugs that could potentially impact how rowid column selects work:

  1. In version 3.24.0 (2018-06-04), there was a fix for a bug in CREATE TABLE AS where "columns of type TEXT never end up holding an INT value." This could potentially impact how rowid column selects work if the rowid column is of type TEXT.

  2. In version 3.23.0 (2018-02-03), there was a fix for a bug involving "arbitrary expressions, including function calls and subqueries, in the filename argument to ATTACH." This could potentially impact how rowid column selects work if the rowid column is referenced in such an expression.

  3. In version 3.8.2 (2013-12-06), there was a change to the defined behavior for the CAST expression when casting floating point values greater than +9223372036854775807 into integers. This could potentially impact how rowid column selects work if the rowid column is of type INTEGER and the values being cast are outside the valid range.

Those are all earlier than 3.25 so it's not them.

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

Follow-up:

llm -c 'Anything later than 3.25 ? The bug I am seeing is present in 3.25.3 - the error I got was Row 0 is missing primary key column(s): "rowid"'

But no, neither of these look likely. I may have to read the release notes myself!

Okay, got it. Let's look for any relevant bug fixes after version 3.25.3 then.

In the release history, I found the following potentially relevant bug fixes after 3.25.3:

  1. In version 3.35.0 (2021-03-12), there was an enhancement to the UPSERT feature, which could have impacted how rowid column selects work. The release notes mention:

"Generalize UPSERT: Allow multiple ON CONFLICT clauses that are evaluated in order, The final ON CONFLICT clause may omit the conflict target and yet still use DO UPDATE."

This change to the UPSERT behavior could have inadvertently introduced a bug related to rowid column selects.

  1. In version 3.34.0 (2020-12-01), there was a bug fix for an issue with the "ALTER TABLE RENAME COLUMN" command, which could have impacted how rowid column selects work if the rowid column was renamed.

The release notes state:

"Fix the ALTER TABLE RENAME COLUMN command so that it works for tables that have redundant UNIQUE constraints."

So these two versions, 3.35.0 and 3.34.0, would be good candidates to investigate further for potential bugs related to the issue you're seeing with the "Row 0 is missing primary key column(s): "rowid"" error.

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

Couldn't find anything searching https://sqlite.org/changes.html for rowid manually either.

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

3.25.3 amalgamation: https://www.sqlite.org/2018/sqlite-amalgamation-3250300.zip

(I had to use Internet Archive to find it, couldn't quite guess the URL: https://web.archive.org/web/20181120115249/https://www.sqlite.org/download.html)

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

I figured out how to run that version of SQLite on macOS here:

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

Means I can run the tests like this:

DYLD_LIBRARY_PATH=/tmp/sqlite-amalgamation-3250300 pytest -n auto

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

Both failing tests are in test_api_write.py - but I've not been able to get them to fail on macOS:

DYLD_LIBRARY_PATH=/tmp/sqlite-amalgamation-3250300 \
  pytest tests/test_api_write.py -n auto

Weirdly without -n auto I get a flurry of AttributeError: '_UnixSelectorEventLoop' object has no attribute '_ssock' errors which I think are unrelated.

Maybe I need to try getting the tests to fail under Ubuntu in a Docker container instead.

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

I'm going to leave this for the moment.

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

I figured out how to run the test suite in a Ubuntu Docker container with that old version of SQLite here - but the tests pass for me:

@simonw
Copy link
Owner Author

simonw commented Jun 13, 2024

Using that trickery I still can't replicate the failing tests using Ubuntu and Python 3.12.3 and SQLite: 3.25.3 - which is the same combo as the failed run in https://github.com/simonw/datasette/actions/runs/9503993969/job/26195656683

platform linux -- Python 3.12.3, pytest-8.2.2, pluggy-1.5.0
SQLite: 3.25.3
rootdir: /datasette
configfile: pytest.ini
plugins: timeout-2.3.1, asyncio-0.23.7, anyio-4.4.0, xdist-3.6.1
asyncio: mode=Mode.STRICT
collected 114 items                                                                                                           

tests/test_api_write.py ............................................................................................... [ 83%]
.........XX........                                                                                                     [100%]

=============================================== 112 passed, 2 xpassed in 2.17s ================================================

See #2349 (comment) for how I ran that.

@simonw
Copy link
Owner Author

simonw commented Jun 14, 2024

Might be easier to replicate this using GitHub codespaces. Those might be closer to the GitHub Actions environment

@simonw
Copy link
Owner Author

simonw commented Jun 14, 2024

Another option would be a way to have that specific test dumped out a SQLite database file if it fails which then gets uploaded to the run as a as artifact.

@simonw simonw mentioned this issue Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant