-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLAlchemy 2.0 #7583
SQLAlchemy 2.0 #7583
Conversation
a733c39
to
ada7daa
Compare
Executing this branch with
Are we expecting everything to work? For your comment I suspect yes:
So it would be nice to have some guidelines if we are gonna need to migrate or change extensions. |
Some guidelines are already there - I added a changelog entry with all incompatible changes that I identified. |
BTW, @pdelboca , It's good that you identified this problem. Can you trigger your github workflow that tests popular extensions against this branch somehow? |
ckan/templates/user/list.html
Outdated
@@ -16,7 +16,8 @@ <h1 class="page-heading"> | |||
<ul class="user-list"> | |||
{% block users_list_inner %} | |||
{% for user in page.items %} | |||
<li>{{ h.linked_user(user['name'], maxlength=20) }}</li> | |||
{# `user` is a tuple with a single item - username #} | |||
<li>{{ h.linked_user(user[0], maxlength=20) }}</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better if switching sqlalchemy versions didn't affect templates this way, can we update the view to generate the old dict format instead of the template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sqlalchemy v2 disabled some magic that we are using here, so using old syntax is not an option. Just to be clear, page.items
does not contain dictionaries. Even in master
. The type is the same on master
and sqlalchemy-2.0
- list[Row]
, where Row
is some sort of tuple. In SQLAlchemy v1 this Row
allowed inconsistent access to its content by named attributes. In SQLAlchemy v2 this implicit logic was removed.
In order to "generate the old dict format", the view has to be rewritten and we have to manually dictize every user that is passed to the template. While it allows us using user.name
once again in templates, such change:
- implies dictization inside view. We are usually doing it inside action, so it sounds like a wrong thing
- breaks plugins, that expected
list[Row]
in template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said in your first bullet our usual pattern is to have view call actions that return lists/dicts to pass to templates. Passing sqlalchemy rows to templates (that now behave differently in sqlalchemy 2) is the oddity here that I think is worth fixing.
I don't imagine there are any plugin templates that depend on user being list[Row]
because jinja2 treats everything as a dict-like object even when accessing attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot call an action that returns dictionaries in this case:)
A long time ago we had a list of dictionaries. But the user_list
action does not have limit/ offset support, so it returned all the users, which caused timeouts on some portals.
Now user.index
view gets a query object from the action, which is itself is pretty strange solution, but that's our reality. In order to fix it we have to add limit
/offset
to the user_list
action. But it's not enough. This action returns a list of users, and we don't know how many users there are in order to build a pager.
we cannot use a total number of users in DB because user_list
accepts q
parameter that filters users by name. So we have to make two requests: one that returns a list of users inside limit/offset window and another one that returns a query, which we'll use to get the number of users matching the q
for pagination widget.
And, again, the only alternative is to violate our rules and perform dictization over query results inside view.
So neither solution is perfect. That's why I am so reluctant it terms of making this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPD: I added dictization to user.index
endpoint. Even though it's not perfect, at least now we have comments that explain, why we are doing things in this way. And when somebody decides to update this view/action, it will save him a bit of time
clause_str = ('"{0}" in ({1})'.format( | ||
field, | ||
','.join(f":{p}" for p in placeholders) | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while making changes here let's make this sql generation from fieldnames properly safe. I would use my old identifier
function but maybe there's a sa.column
or something that is better now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sa.column
solves the issue. Is it worth backporting? every field itself is checked in the beginning of the method against a list of existing column names: field not in fields_types:
, so it's impossible to pass an arbitrary string here. The person has to create a datastore with corresponding field first
The only edge case I can imagine - datastore field contains "
in its name. I doubt that there is a possible vector for SQL ingestion, because all statements that rely on where-clauses are executed as a single query. So basically user who uploads a "dangerous" resource can only affect this exact resource's datastore table.
else: | ||
clause: tuple[Any, ...] = (u'"{0}" = %s'.format(field), value) | ||
placeholder = f"value_{next(idx_gen)}" | ||
clause: tuple[Any, ...] = (f'"{field}" = :{placeholder}', { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+
context['connection'].execute( | ||
sql_drop_index.format(index[0]).replace('%', '%%')) | ||
context['connection'].execute(sa.text( | ||
sql_drop_index.format(index[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially unsafe index name formatting here, but only if a bad name was created in the datastore db by another application/db user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - sa.column
should work here as well
Let's go 🚀 |
Prepare code for SQLAlchemy v2.0.
We are using v1.4 right now, and the best part of it is the fact, that this version contains all the features that will be removed in v2.0(that's why we were able to start using it without big code changes) as well as almost all features that are available in v2.0. It means we can migrate the code staying on v1.4 and at some point in the future just change SQLAlchemy's version in the requirements.txt and everything will work like a charm(ideally).
Plan:
I expect that code changes won't be visible outside of CKAN and almost everything will be completely compatible with the existing extensions. At the moment I identified only one place, that requires breaking changes: the
where
key for a dictionary returned from theIDatastore.datastore_search
method. SQLAlchemy v2.0 does not allow %-placeholders(WHERE x = %s
). Instead it requires :-placeholders(WHERE x = :x
). Here you can find details from changelog entry