-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
models: improve indexes #213
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #213 +/- ##
==========================================
+ Coverage 74.44% 74.50% +0.05%
==========================================
Files 7 7
Lines 900 902 +2
==========================================
+ Hits 670 672 +2
Misses 230 230
|
a57338c
to
0ce3bb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well! Just left a minor cosmetic comments on the names of the indexes.
|
||
# Create new index on (status) of __reana.workflow | ||
op.create_index( | ||
op.f("ix___reana_workflow_status"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: is there a naming convention to be consistent with the names of the index? See _workflow_uuid_created_ix
and ix___reana_workflow_status
, where ix
is once used as a prefix, and once as a suffix. I see the one for the status is autogenerated by SQLAlchemy, maybe we can try to use the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, SQLAlchemy autogenerates the name, but by default only for indexes. We could set up naming conventions also for foreign keys/etc. like this: https://docs.sqlalchemy.org/en/20/core/constraints.html#configuring-constraint-naming-conventions
However, we already have some inconsistencies:
Indexes:
"workflow_pkey" PRIMARY KEY, btree (id_)
"_user_workflow_run_uc" UNIQUE CONSTRAINT, btree (owner_id, name, run_number_major, run_number_minor)
"ix___reana_workflow_status" btree (status)
Foreign-key constraints:
"workflow_owner_id_fkey" FOREIGN KEY (owner_id) REFERENCES __reana.user_(id_)
Referenced by:
TABLE "__reana.workflow_resource" CONSTRAINT "workflow_resource_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_)
TABLE "__reana.workflow_session" CONSTRAINT "workflow_session_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_)
TABLE "__reana.workspace_retention_rule" CONSTRAINT "workspace_retention_rule_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_)
workflow_pkey
andworkflow_owner_id_fkey
(generated by PostgreSQL)_user_workflow_run_uc
(custom name that does not refer to any particular column)ix___reana_workflow_status
(autogenerated by SQLAlchemy)
I can try to set some default naming conventions and see what happens, but I will need to update/create a migration to change the name of the already-existing constraints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a new migration to make all the names consistent between each other. Many changes, so lots of testing is needed ;)
This is the code that I have used to generate the list of changes to constraints' names: from sqlalchemy.util import md5_hex
from sqlalchemy.engine.reflection import Inspector
from sqlalchemy import inspect
from reana_db.database import engine
convention = {
"ix": "ix_%(column_0_label)s",
"uq": "uq_%(table_name)s_%(column_0_name)s",
"ck": "ck_%(table_name)s_%(constraint_name)s",
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"pk": "pk_%(table_name)s",
}
affected_table_names = [
"audit_log",
"interactive_session_resource",
"interactive_session",
"job_cache",
"job",
"resource",
"user_",
"user_resource",
"user_token",
"workflow_resource",
"workflow_session",
"workflow",
"workspace_retention_audit_log",
"workspace_retention_rule",
]
# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def truncate_and_render_index_name(dialect, name):
max_ = dialect.max_index_name_length or dialect.max_identifier_length
return _truncate_and_render_maxlen_name(name, max_)
# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def truncate_and_render_constraint_name(dialect, name):
max_ = dialect.max_constraint_name_length or dialect.max_identifier_length
return _truncate_and_render_maxlen_name(name, max_)
# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def _truncate_and_render_maxlen_name(name, max_):
if len(name) > max_:
name = name[0 : max_ - 8] + "_" + md5_hex(name)[-4:]
return name
def main():
insp: Inspector = inspect(engine)
dialect = engine.dialect
for table_name in insp.get_table_names(schema="__reana"):
if table_name not in affected_table_names:
continue
# Primary key
pk = insp.get_pk_constraint(table_name, schema="__reana")
new_name = convention["pk"] % {"table_name": table_name}
new_name = truncate_and_render_constraint_name(dialect, new_name)
print(f'("{table_name}", "{pk["name"]}", "{new_name}",),')
# Unique constraints
for uc in insp.get_unique_constraints(table_name, schema="__reana"):
new_name = convention["uq"] % {
"table_name": table_name,
"column_0_name": uc["column_names"][0],
}
new_name = truncate_and_render_constraint_name(dialect, new_name)
print(f'("{table_name}", "{uc["name"]}", "{new_name}",),')
# Foreing keys
for fk in insp.get_foreign_keys(table_name, schema="__reana"):
new_name = convention["fk"] % {
"table_name": table_name,
"column_0_name": fk["constrained_columns"][0],
"referred_table_name": fk["referred_table"],
}
new_name = truncate_and_render_constraint_name(dialect, new_name)
print(f'("{table_name}", "{fk["name"]}", "{new_name}",),')
# Indexes are skipped as they are all unique constraints anyway
# Check constraints are skipped because there aren't any
if __name__ == "__main__":
main() |
Note that the chosen convention is the one present in SQLAlchemy's docs and which is used also by Invenio (source) |
Add a new index to `__reana.job` on `workflow_uuid` and `created` to query efficiently the jobs that are part of a given workflow, with the possibility of ordering them by creation time. Add a new index to `__reana.workflow` on `status` to efficiently query the total number of running workflows. Change the already-existing index on `__reana.workflow` so that `owner_id` is the leading column, before `name`. Closes reanahub#211
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works great and LGTM!
Add a new index to
__reana.job
onworkflow_uuid
andcreated
toquery efficiently the jobs that are part of a given workflow, with the
possibility of ordering them by creation time.
Add a new index to
__reana.workflow
onstatus
to efficiently querythe total number of running workflows.
Change the already-existing index on
__reana.workflow
so thatowner_id
is the leading column, beforename
.Closes #211