SQLAlchemy 2.0 #7583

smotornyuk · 2023-05-11T12:46:18Z

Prepare code for SQLAlchemy v2.0.

We are using v1.4 right now, and the best part of it is the fact, that this version contains all the features that will be removed in v2.0(that's why we were able to start using it without big code changes) as well as almost all features that are available in v2.0. It means we can migrate the code staying on v1.4 and at some point in the future just change SQLAlchemy's version in the requirements.txt and everything will work like a charm(ideally).

Plan:

[before CKAN v2.11] Replace all deprecated features with the recommended for 2.0
[release CKAN v2.11] Now the last two CKAN versions are using SQLAlchemy v1.4, which means that extension maintainers, who support only the two latest CKAN versions can update their extensions to be completely compatible with SQLAlchemy v1.4 and v2.0 at the same time.
[after CKAN v2.11] Turn on exceptions for all deprecation warnings from SQLAlchemy. It will help us to identify anything we've missed during the first step
[after CKAN v2.11] Write a guide for the extension maintainers
[halfway between CKAN v2.11 and v2.12, ~3months after v2.11 release] Upgrade to SQLAlchemy v2.0

I expect that code changes won't be visible outside of CKAN and almost everything will be completely compatible with the existing extensions. At the moment I identified only one place, that requires breaking changes: the where key for a dictionary returned from the IDatastore.datastore_search method. SQLAlchemy v2.0 does not allow %-placeholders(WHERE x = %s). Instead it requires :-placeholders(WHERE x = :x). Here you can find details from changelog entry

ckan/migration/versions/093_7f70d7d15445_remove_activity_revision_id.py

ckan/model/__init__.py

ckan/model/meta.py

pdelboca · 2023-11-01T13:00:46Z

Executing this branch with harvest plugin will cause an error:

  File "/home/pdelboca/Repos/ckan/.venv/lib/python3.10/site-packages/ckanext/harvest/plugin.py", line 279, in configure
    model_setup()
  File "/home/pdelboca/Repos/ckan/.venv/lib/python3.10/site-packages/ckanext/harvest/model/__init__.py", line 50, in setup
    if not model.package_table.exists():
  File "<string>", line 2, in exists
  File "/home/pdelboca/Repos/ckan/.venv/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py", line 468, in warned
    return fn(*args, **kwargs)
  File "/home/pdelboca/Repos/ckan/.venv/lib/python3.10/site-packages/sqlalchemy/sql/schema.py", line 941, in exists
    bind = _bind_or_error(self)
  File "/home/pdelboca/Repos/ckan/.venv/lib/python3.10/site-packages/sqlalchemy/sql/base.py", line 1659, in _bind_or_error
    raise exc.UnboundExecutionError(msg)
sqlalchemy.exc.UnboundExecutionError: Table object 'package' is not bound to an Engine or Connection.  Execution can not proceed without a database to execute against.

Are we expecting everything to work? For your comment I suspect yes:

I expect that code changes won't be visible outside of CKAN and almost everything will be completely compatible with the existing extensions.

So it would be nice to have some guidelines if we are gonna need to migrate or change extensions.

smotornyuk · 2023-11-02T12:41:01Z

Some guidelines are already there - I added a changelog entry with all incompatible changes that I identified.
As for ckanext-harvest, it's a special case. It still defines and initializes tables everytime during the application startup instead of using database migrations. I'll see what can be done, but instead of introducing an exception for plugins that relying on undocumented logic, I'll switch ckanext-harvest to alembic migrations.

smotornyuk · 2023-11-02T12:42:29Z

BTW, @pdelboca , It's good that you identified this problem. Can you trigger your github workflow that tests popular extensions against this branch somehow?
I'm sure I saw this initialize-table-on-startup strategy somewhere else

wardi · 2023-11-06T16:20:01Z

ckan/templates/user/list.html

@@ -16,7 +16,8 @@ <h1 class="page-heading">
        <ul class="user-list">
          {% block users_list_inner %}
            {% for user in page.items %}
-              <li>{{ h.linked_user(user['name'], maxlength=20) }}</li>
+              {# `user` is a tuple with a single item - username #}
+              <li>{{ h.linked_user(user[0], maxlength=20) }}</li>


Would be better if switching sqlalchemy versions didn't affect templates this way, can we update the view to generate the old dict format instead of the template?

Sqlalchemy v2 disabled some magic that we are using here, so using old syntax is not an option. Just to be clear, page.items does not contain dictionaries. Even in master. The type is the same on master and sqlalchemy-2.0 - list[Row], where Row is some sort of tuple. In SQLAlchemy v1 this Row allowed inconsistent access to its content by named attributes. In SQLAlchemy v2 this implicit logic was removed.

In order to "generate the old dict format", the view has to be rewritten and we have to manually dictize every user that is passed to the template. While it allows us using user.name once again in templates, such change:

implies dictization inside view. We are usually doing it inside action, so it sounds like a wrong thing

breaks plugins, that expected list[Row] in template.

As you said in your first bullet our usual pattern is to have view call actions that return lists/dicts to pass to templates. Passing sqlalchemy rows to templates (that now behave differently in sqlalchemy 2) is the oddity here that I think is worth fixing.

I don't imagine there are any plugin templates that depend on user being list[Row] because jinja2 treats everything as a dict-like object even when accessing attributes.

We cannot call an action that returns dictionaries in this case:)

A long time ago we had a list of dictionaries. But the user_list action does not have limit/ offset support, so it returned all the users, which caused timeouts on some portals.

Now user.index view gets a query object from the action, which is itself is pretty strange solution, but that's our reality. In order to fix it we have to add limit/offset to the user_list action. But it's not enough. This action returns a list of users, and we don't know how many users there are in order to build a pager.
we cannot use a total number of users in DB because user_list accepts q parameter that filters users by name. So we have to make two requests: one that returns a list of users inside limit/offset window and another one that returns a query, which we'll use to get the number of users matching the q for pagination widget.

And, again, the only alternative is to violate our rules and perform dictization over query results inside view.

So neither solution is perfect. That's why I am so reluctant it terms of making this change

UPD: I added dictization to user.index endpoint. Even though it's not perfect, at least now we have comments that explain, why we are doing things in this way. And when somebody decides to update this view/action, it will save him a bit of time

wardi · 2023-11-06T16:33:09Z

ckanext/datastore/backend/postgres.py

+            clause_str = ('"{0}" in ({1})'.format(
+                field,
+                ','.join(f":{p}" for p in placeholders)
+            ))


while making changes here let's make this sql generation from fieldnames properly safe. I would use my old identifier function but maybe there's a sa.column or something that is better now?

Yes, sa.column solves the issue. Is it worth backporting? every field itself is checked in the beginning of the method against a list of existing column names: field not in fields_types:, so it's impossible to pass an arbitrary string here. The person has to create a datastore with corresponding field first

The only edge case I can imagine - datastore field contains " in its name. I doubt that there is a possible vector for SQL ingestion, because all statements that rely on where-clauses are executed as a single query. So basically user who uploads a "dangerous" resource can only affect this exact resource's datastore table.

wardi · 2023-11-06T16:34:57Z

ckanext/datastore/backend/postgres.py

        else:
-            clause: tuple[Any, ...] = (u'"{0}" = %s'.format(field), value)
+            placeholder = f"value_{next(idx_gen)}"
+            clause: tuple[Any, ...] = (f'"{field}" = :{placeholder}', {


same issue here

wardi · 2023-11-06T16:40:32Z

ckanext/datastore/backend/postgres.py

-        context['connection'].execute(
-            sql_drop_index.format(index[0]).replace('%', '%%'))
+        context['connection'].execute(sa.text(
+            sql_drop_index.format(index[0])


potentially unsafe index name formatting here, but only if a bad name was created in the datastore db by another application/db user

Fixed - sa.column should work here as well

pdelboca · 2023-11-21T13:43:52Z

Let's go 🚀

smotornyuk added 8 commits May 9, 2023 21:42

Fix warning from web ui

51e939b

switch to imperative mapping

15c736d

Fix major datastore problems

a5c99d8

work on datastore/test_create

0f0c177

green datastore tests

139edf4

Update interface and mention breaking change

b9364cf

Fix changelog entry

2a802a4

fix sqlite example

e060400

smotornyuk changed the title ~~Sqlalchemy 2.0~~ SQLAlchemy 2.0 May 11, 2023

smotornyuk added 17 commits May 11, 2023 23:04

Update plugins

ed36999

Merge remote-tracking branch 'origin/master' into sqlalchemy-2.0

22acae1

fix core tests

b25ccad

Fix codestyle and types

c5b56ee

Remove unnecesarry type ignores

bbe83f5

FIx set-permissions

18d4c9b

Fix initialization

5012226

avoid race condition in datastore tests'

6cec8ac

flake8

c007497

Fail CircleCI tests when deprecated functionality is used

f6a770c

try 2.0

b71e2e8

v2.0 config

1e51d67

fix tests

984a113

avoid nested transactions

c965189

try using engine for types cache

eeae452

datastore.get_all_ids

f494c9e

Switch back to v1.4

ada7daa

smotornyuk force-pushed the sqlalchemy-2.0 branch from a733c39 to ada7daa Compare May 15, 2023 11:04

smotornyuk marked this pull request as ready for review May 16, 2023 06:12

wardi self-assigned this May 16, 2023

wardi reviewed May 24, 2023

View reviewed changes

ckan/migration/versions/093_7f70d7d15445_remove_activity_revision_id.py Show resolved Hide resolved

smotornyuk added 2 commits October 12, 2023 17:45

fix indentation

e80ddc4

fix tests

4e7418a

amercader added this to the CKAN 2.11 milestone Oct 17, 2023

smotornyuk added 4 commits October 31, 2023 12:19

Merge branch 'master' into sqlalchemy-2.0

9a37827

simplify users variable in user list template

038913d

explain user[0] in user-list template

cfb746e

fix types

ca27591

pdelboca reviewed Nov 1, 2023

View reviewed changes

ckan/model/__init__.py Outdated Show resolved Hide resolved

pdelboca reviewed Nov 1, 2023

View reviewed changes

ckan/model/__init__.py Outdated Show resolved Hide resolved

pdelboca reviewed Nov 1, 2023

View reviewed changes

ckan/model/meta.py Outdated Show resolved Hide resolved

Explain how to initialize missing engine

95ace55

wardi assigned pdelboca Nov 2, 2023

smotornyuk added 2 commits November 2, 2023 15:49

explain how to solve unboundexecutor error

8d5455d

Merge remote-tracking branch 'origin/master' into sqlalchemy-2.0

5fb850d

wardi reviewed Nov 6, 2023

View reviewed changes

smotornyuk added 5 commits November 6, 2023 22:16

fix: safer column names in field clauses

cc52a09

remove extra quotes around sa.column

ff79ced

Merge remote-tracking branch 'origin/master' into sqlalchemy-2.0

6afd476

dictize users for user.index endpoint

dd9aec2

fix tests

5494c2e

pdelboca merged commit 8ddc2e8 into master Nov 21, 2023
7 checks passed

pdelboca deleted the sqlalchemy-2.0 branch November 21, 2023 13:43

avdata99 mentioned this pull request Oct 28, 2024

Tracking extension fix for 2.11+ #8479

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQLAlchemy 2.0 #7583

SQLAlchemy 2.0 #7583

smotornyuk commented May 11, 2023 •

edited

Loading

pdelboca commented Nov 1, 2023

smotornyuk commented Nov 2, 2023

smotornyuk commented Nov 2, 2023 •

edited

Loading

wardi Nov 6, 2023

smotornyuk Nov 6, 2023

wardi Nov 6, 2023

smotornyuk Nov 6, 2023

smotornyuk Nov 7, 2023

wardi Nov 6, 2023

smotornyuk Nov 6, 2023 •

edited

Loading

wardi Nov 6, 2023

smotornyuk Nov 6, 2023 •

edited

Loading

wardi Nov 6, 2023

smotornyuk Nov 6, 2023

pdelboca commented Nov 21, 2023

SQLAlchemy 2.0 #7583

SQLAlchemy 2.0 #7583

Conversation

smotornyuk commented May 11, 2023 • edited Loading

pdelboca commented Nov 1, 2023

smotornyuk commented Nov 2, 2023

smotornyuk commented Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smotornyuk Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smotornyuk Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdelboca commented Nov 21, 2023

smotornyuk commented May 11, 2023 •

edited

Loading

smotornyuk commented Nov 2, 2023 •

edited

Loading

smotornyuk Nov 6, 2023 •

edited

Loading

smotornyuk Nov 6, 2023 •

edited

Loading