Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

models: improve indexes #213

Merged
merged 2 commits into from
Nov 30, 2023
Merged

models: improve indexes #213

merged 2 commits into from
Nov 30, 2023

Conversation

mdonadoni
Copy link
Member

@mdonadoni mdonadoni commented Nov 23, 2023

Add a new index to __reana.job on workflow_uuid and created to
query efficiently the jobs that are part of a given workflow, with the
possibility of ordering them by creation time.

Add a new index to __reana.workflow on status to efficiently query
the total number of running workflows.

Change the already-existing index on __reana.workflow so that
owner_id is the leading column, before name.

Closes #211

Copy link

codecov bot commented Nov 23, 2023

Codecov Report

Merging #213 (7412b29) into master (589978b) will increase coverage by 0.05%.
The diff coverage is 100.00%.

❗ Current head 7412b29 differs from pull request most recent head 74737b8. Consider uploading reports for the commit 74737b8 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #213      +/-   ##
==========================================
+ Coverage   74.44%   74.50%   +0.05%     
==========================================
  Files           7        7              
  Lines         900      902       +2     
==========================================
+ Hits          670      672       +2     
  Misses        230      230              
Files Coverage Δ
reana_db/models.py 90.34% <100.00%> (+0.03%) ⬆️

Copy link
Member

@giuseppe-steduto giuseppe-steduto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well! Just left a minor cosmetic comments on the names of the indexes.


# Create new index on (status) of __reana.workflow
op.create_index(
op.f("ix___reana_workflow_status"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: is there a naming convention to be consistent with the names of the index? See _workflow_uuid_created_ix and ix___reana_workflow_status, where ix is once used as a prefix, and once as a suffix. I see the one for the status is autogenerated by SQLAlchemy, maybe we can try to use the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SQLAlchemy autogenerates the name, but by default only for indexes. We could set up naming conventions also for foreign keys/etc. like this: https://docs.sqlalchemy.org/en/20/core/constraints.html#configuring-constraint-naming-conventions

However, we already have some inconsistencies:

Indexes:
    "workflow_pkey" PRIMARY KEY, btree (id_)
    "_user_workflow_run_uc" UNIQUE CONSTRAINT, btree (owner_id, name, run_number_major, run_number_minor)
    "ix___reana_workflow_status" btree (status)
Foreign-key constraints:
    "workflow_owner_id_fkey" FOREIGN KEY (owner_id) REFERENCES __reana.user_(id_)
Referenced by:
    TABLE "__reana.workflow_resource" CONSTRAINT "workflow_resource_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_)
    TABLE "__reana.workflow_session" CONSTRAINT "workflow_session_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_)
    TABLE "__reana.workspace_retention_rule" CONSTRAINT "workspace_retention_rule_workflow_id_fkey" FOREIGN KEY (workflow_id) REFERENCES __reana.workflow(id_)
  • workflow_pkey and workflow_owner_id_fkey (generated by PostgreSQL)
  • _user_workflow_run_uc (custom name that does not refer to any particular column)
  • ix___reana_workflow_status (autogenerated by SQLAlchemy)

I can try to set some default naming conventions and see what happens, but I will need to update/create a migration to change the name of the already-existing constraints

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new migration to make all the names consistent between each other. Many changes, so lots of testing is needed ;)

@mdonadoni
Copy link
Member Author

mdonadoni commented Nov 29, 2023

This is the code that I have used to generate the list of changes to constraints' names:

from sqlalchemy.util import md5_hex
from sqlalchemy.engine.reflection import Inspector
from sqlalchemy import inspect

from reana_db.database import engine

convention = {
    "ix": "ix_%(column_0_label)s",
    "uq": "uq_%(table_name)s_%(column_0_name)s",
    "ck": "ck_%(table_name)s_%(constraint_name)s",
    "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
    "pk": "pk_%(table_name)s",
}

affected_table_names = [
    "audit_log",
    "interactive_session_resource",
    "interactive_session",
    "job_cache",
    "job",
    "resource",
    "user_",
    "user_resource",
    "user_token",
    "workflow_resource",
    "workflow_session",
    "workflow",
    "workspace_retention_audit_log",
    "workspace_retention_rule",
]


# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def truncate_and_render_index_name(dialect, name):
    max_ = dialect.max_index_name_length or dialect.max_identifier_length
    return _truncate_and_render_maxlen_name(name, max_)


# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def truncate_and_render_constraint_name(dialect, name):
    max_ = dialect.max_constraint_name_length or dialect.max_identifier_length
    return _truncate_and_render_maxlen_name(name, max_)


# adapted from IdentifierPreparer of SQLAlchemy 1.4+
def _truncate_and_render_maxlen_name(name, max_):
    if len(name) > max_:
        name = name[0 : max_ - 8] + "_" + md5_hex(name)[-4:]
    return name


def main():
    insp: Inspector = inspect(engine)
    dialect = engine.dialect

    for table_name in insp.get_table_names(schema="__reana"):
        if table_name not in affected_table_names:
            continue

        # Primary key
        pk = insp.get_pk_constraint(table_name, schema="__reana")
        new_name = convention["pk"] % {"table_name": table_name}
        new_name = truncate_and_render_constraint_name(dialect, new_name)
        print(f'("{table_name}", "{pk["name"]}", "{new_name}",),')

        # Unique constraints
        for uc in insp.get_unique_constraints(table_name, schema="__reana"):
            new_name = convention["uq"] % {
                "table_name": table_name,
                "column_0_name": uc["column_names"][0],
            }
            new_name = truncate_and_render_constraint_name(dialect, new_name)
            print(f'("{table_name}", "{uc["name"]}", "{new_name}",),')

        # Foreing keys
        for fk in insp.get_foreign_keys(table_name, schema="__reana"):
            new_name = convention["fk"] % {
                "table_name": table_name,
                "column_0_name": fk["constrained_columns"][0],
                "referred_table_name": fk["referred_table"],
            }
            new_name = truncate_and_render_constraint_name(dialect, new_name)
            print(f'("{table_name}", "{fk["name"]}", "{new_name}",),')

        # Indexes are skipped as they are all unique constraints anyway
        # Check constraints are skipped because there aren't any


if __name__ == "__main__":
    main()

@mdonadoni
Copy link
Member Author

Note that the chosen convention is the one present in SQLAlchemy's docs and which is used also by Invenio (source)

Add a new index to `__reana.job` on `workflow_uuid` and `created` to
query efficiently the jobs that are part of a given workflow, with the
possibility of ordering them by creation time.

Add a new index to `__reana.workflow` on `status` to efficiently query
the total number of running workflows.

Change the already-existing index on `__reana.workflow` so that
`owner_id` is the leading column, before `name`.

Closes reanahub#211
@mdonadoni mdonadoni mentioned this pull request Nov 30, 2023
2 tasks
Copy link
Member

@giuseppe-steduto giuseppe-steduto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great and LGTM!

@mdonadoni mdonadoni merged commit 74737b8 into reanahub:master Nov 30, 2023
10 checks passed
@mdonadoni mdonadoni deleted the indexes branch December 1, 2023 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

global: monitor database queries
2 participants