models: improve indexes #213

mdonadoni · 2023-11-23T14:08:41Z

Add a new index to __reana.job on workflow_uuid and created to
query efficiently the jobs that are part of a given workflow, with the
possibility of ordering them by creation time.

Add a new index to __reana.workflow on status to efficiently query
the total number of running workflows.

Change the already-existing index on __reana.workflow so that
owner_id is the leading column, before name.

Closes #211

codecov · 2023-11-23T14:10:22Z

Codecov Report

Merging #213 (7412b29) into master (589978b) will increase coverage by 0.05%.
The diff coverage is 100.00%.

❗ Current head 7412b29 differs from pull request most recent head 74737b8. Consider uploading reports for the commit 74737b8 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #213      +/-   ##
==========================================
+ Coverage   74.44%   74.50%   +0.05%     
==========================================
  Files           7        7              
  Lines         900      902       +2     
==========================================
+ Hits          670      672       +2     
  Misses        230      230

Files	Coverage Δ
reana_db/models.py	`90.34% <100.00%> (+0.03%)`	⬆️

giuseppe-steduto

Works well! Just left a minor cosmetic comments on the names of the indexes.

giuseppe-steduto · 2023-11-28T14:57:56Z

reana_db/alembic/versions/20231123_1342_126601b69c78_improve_indexes_usage.py

+
+    # Create new index on (status) of __reana.workflow
+    op.create_index(
+        op.f("ix___reana_workflow_status"),


Minor: is there a naming convention to be consistent with the names of the index? See _workflow_uuid_created_ix and ix___reana_workflow_status, where ix is once used as a prefix, and once as a suffix. I see the one for the status is autogenerated by SQLAlchemy, maybe we can try to use the same?

Yes, SQLAlchemy autogenerates the name, but by default only for indexes. We could set up naming conventions also for foreign keys/etc. like this: https://docs.sqlalchemy.org/en/20/core/constraints.html#configuring-constraint-naming-conventions

However, we already have some inconsistencies:

Indexes: "workflow_pkey" PRIMARY KEY, btree (id_) "_user_workflow_run_uc" UNIQUE CONSTRAINT, btree (owner_id, name, run_number_major, run_number_minor) "ix___reana_workflow_status" btree (status) Foreign-key constraints: "workflow_owner_id_fkey" FOREIGN KEY (owner_id) REFERENCES __reana.user_(id_) Referenced by: TABLE "__reana.workflow_resource" CONSTRAINT "workflow_resource_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_) TABLE "__reana.workflow_session" CONSTRAINT "workflow_session_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_) TABLE "__reana.workspace_retention_rule" CONSTRAINT "workspace_retention_rule_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_)

workflow_pkey and workflow_owner_id_fkey (generated by PostgreSQL)

_user_workflow_run_uc (custom name that does not refer to any particular column)

ix___reana_workflow_status (autogenerated by SQLAlchemy)

I can try to set some default naming conventions and see what happens, but I will need to update/create a migration to change the name of the already-existing constraints

I have added a new migration to make all the names consistent between each other. Many changes, so lots of testing is needed ;)

mdonadoni · 2023-11-29T14:10:47Z

This is the code that I have used to generate the list of changes to constraints' names:

from sqlalchemy.util import md5_hex
from sqlalchemy.engine.reflection import Inspector
from sqlalchemy import inspect

from reana_db.database import engine

convention = {
    "ix": "ix_%(column_0_label)s",
    "uq": "uq_%(table_name)s_%(column_0_name)s",
    "ck": "ck_%(table_name)s_%(constraint_name)s",
    "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
    "pk": "pk_%(table_name)s",
}

affected_table_names = [
    "audit_log",
    "interactive_session_resource",
    "interactive_session",
    "job_cache",
    "job",
    "resource",
    "user_",
    "user_resource",
    "user_token",
    "workflow_resource",
    "workflow_session",
    "workflow",
    "workspace_retention_audit_log",
    "workspace_retention_rule",
]


# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def truncate_and_render_index_name(dialect, name):
    max_ = dialect.max_index_name_length or dialect.max_identifier_length
    return _truncate_and_render_maxlen_name(name, max_)


# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def truncate_and_render_constraint_name(dialect, name):
    max_ = dialect.max_constraint_name_length or dialect.max_identifier_length
    return _truncate_and_render_maxlen_name(name, max_)


# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def _truncate_and_render_maxlen_name(name, max_):
    if len(name) > max_:
        name = name[0 : max_ - 8] + "_" + md5_hex(name)[-4:]
    return name


def main():
    insp: Inspector = inspect(engine)
    dialect = engine.dialect

    for table_name in insp.get_table_names(schema="__reana"):
        if table_name not in affected_table_names:
            continue

        # Primary key
        pk = insp.get_pk_constraint(table_name, schema="__reana")
        new_name = convention["pk"] % {"table_name": table_name}
        new_name = truncate_and_render_constraint_name(dialect, new_name)
        print(f'("{table_name}", "{pk["name"]}", "{new_name}",),')

        # Unique constraints
        for uc in insp.get_unique_constraints(table_name, schema="__reana"):
            new_name = convention["uq"] % {
                "table_name": table_name,
                "column_0_name": uc["column_names"][0],
            }
            new_name = truncate_and_render_constraint_name(dialect, new_name)
            print(f'("{table_name}", "{uc["name"]}", "{new_name}",),')

        # Foreing keys
        for fk in insp.get_foreign_keys(table_name, schema="__reana"):
            new_name = convention["fk"] % {
                "table_name": table_name,
                "column_0_name": fk["constrained_columns"][0],
                "referred_table_name": fk["referred_table"],
            }
            new_name = truncate_and_render_constraint_name(dialect, new_name)
            print(f'("{table_name}", "{fk["name"]}", "{new_name}",),')

        # Indexes are skipped as they are all unique constraints anyway
        # Check constraints are skipped because there aren't any


if __name__ == "__main__":
    main()

mdonadoni · 2023-11-29T14:38:26Z

Note that the chosen convention is the one present in SQLAlchemy's docs and which is used also by Invenio (source)

Add a new index to `__reana.job` on `workflow_uuid` and `created` to query efficiently the jobs that are part of a given workflow, with the possibility of ordering them by creation time. Add a new index to `__reana.workflow` on `status` to efficiently query the total number of running workflows. Change the already-existing index on `__reana.workflow` so that `owner_id` is the leading column, before `name`. Closes reanahub#211

giuseppe-steduto

Works great and LGTM!

mdonadoni force-pushed the indexes branch 3 times, most recently from a57338c to 0ce3bb8 Compare November 24, 2023 13:53

mdonadoni mentioned this pull request Nov 24, 2023

fix(models): add missing foreign key to workflow_uuid of Job (#214) #214

Merged

mdonadoni force-pushed the indexes branch from 0ce3bb8 to 0db3756 Compare November 28, 2023 13:38

giuseppe-steduto approved these changes Nov 28, 2023

View reviewed changes

mdonadoni force-pushed the indexes branch from 0db3756 to 7412b29 Compare November 29, 2023 14:07

mdonadoni requested a review from giuseppe-steduto November 29, 2023 14:13

mdonadoni added 2 commits November 29, 2023 17:33

models: enforce constraint naming convention

58b2a9f

mdonadoni force-pushed the indexes branch from 7412b29 to 74737b8 Compare November 29, 2023 16:33

mdonadoni mentioned this pull request Nov 30, 2023

release: 0.9.3 #215

Merged

2 tasks

giuseppe-steduto approved these changes Nov 30, 2023

View reviewed changes

mdonadoni merged commit 74737b8 into reanahub:master Nov 30, 2023
10 checks passed

mdonadoni deleted the indexes branch December 1, 2023 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models: improve indexes #213

models: improve indexes #213

mdonadoni commented Nov 23, 2023 •

edited

Loading

codecov bot commented Nov 23, 2023 •

edited

Loading

giuseppe-steduto left a comment

giuseppe-steduto Nov 28, 2023

mdonadoni Nov 28, 2023

mdonadoni Nov 29, 2023

mdonadoni commented Nov 29, 2023 •

edited

Loading

mdonadoni commented Nov 29, 2023

giuseppe-steduto left a comment

models: improve indexes #213

models: improve indexes #213

Conversation

mdonadoni commented Nov 23, 2023 • edited Loading

codecov bot commented Nov 23, 2023 • edited Loading

Codecov Report

giuseppe-steduto left a comment

Choose a reason for hiding this comment

giuseppe-steduto Nov 28, 2023

Choose a reason for hiding this comment

mdonadoni Nov 28, 2023

Choose a reason for hiding this comment

mdonadoni Nov 29, 2023

Choose a reason for hiding this comment

mdonadoni commented Nov 29, 2023 • edited Loading

mdonadoni commented Nov 29, 2023

giuseppe-steduto left a comment

Choose a reason for hiding this comment

mdonadoni commented Nov 23, 2023 •

edited

Loading

codecov bot commented Nov 23, 2023 •

edited

Loading

mdonadoni commented Nov 29, 2023 •

edited

Loading