Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IS_DELETED status on xxx_LOGICAL_RESOURCES tables #1958

Closed
punktilious opened this issue Feb 17, 2021 · 7 comments
Closed

Support IS_DELETED status on xxx_LOGICAL_RESOURCES tables #1958

punktilious opened this issue Feb 17, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request persistence schema-change a schema change

Comments

@punktilious
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Search queries currently have to join to the xxx_RESOURCES table to check the value of the IS_DELETED flag for the current resource. This is particularly expensive because the xxx_RESOURCES table also contains the payload data column which is frequently stored inline in the row if it's not too big...but because the payload data is fairly big, this reduces the row density within a block and probably leads to more IO than expected.

By adding the current deletion status to the xxx_LOGICAL_RESOURCES table it is possible to remove the xxx_RESOURCES table from each parameter join. This simplifies the query, and because xxx_LOGICAL_RESOURCES is smaller, it is easier to keep cached which will result in much better performance for more complex searches.

Describe the solution you'd like

  • Add IS_DELETED flag to xxx_LOGICAL_RESOURCES
  • Create a migration step to update the value of this flag based on the current resource id for each resource
  • Eliminate xxx_RESOURCES from search queries where possible. This table should only need to be joined once at the outer level (after the select disctinct). Modify the is_deleted = 'N' predicate to use the new column.
@prb112 prb112 added schema-change a schema change enhancement New feature or request persistence labels Feb 17, 2021
@lmsurpre
Copy link
Member

stretch goal for 4.6.0

@lmsurpre lmsurpre added this to the Sprint 2021-03 milestone Feb 17, 2021
@punktilious
Copy link
Collaborator Author

The column is now added to each xxx_LOGICAL_RESOURCES table. When the column is added, we use DEFAULT 'X'. The schema-tool migration will then run a correlated update to set this value to match the deletion status for the current resource. The stored procedures then keep this value updated for new and updated resources.

To verify, before upgrading the schema to the 4.6.0 release version perform the following steps (can be any resource type, I'm just using patient as an example:

CREATE a new patient and take note of its logical id == patient-1;
UPDATE patient-1
DELETE patient-1
UPDATE patient-1 (undelete)

CREATE another patient and take note of its logical id == patient-2
DELETE patient-2

Now perform the schema upgrade to 4.6.0. With the new FHIR server:

CREATE a new patient == patient-3

CREATE a new patient == patient-4
DELETE patient-4

At a SQL prompt, run the following statement to get the is_deleted flag value. If you are using Db2, then remember to run SET fhir_admin.SV_TENANT_ID=1 (or whatever tenant id you are using) first. This will only work if you connect as the fhiradmin user (not the fhirserver user).

SQL> SELECT count(*) FROM fhirdata.patient_logical_resources WHERE is_deleted NOT IN ('Y', 'N')

This result should be 0.

In the following commands change the logical_id value to match the values you actually used or recorded for the 4 patients:

select is_deleted from fhirdata.patient_logical_resources where logical_id = 'patient-1'
-- RESULT SHOULD BE 'N'

select is_deleted from fhirdata.patient_logical_resources where logical_id = 'patient-2'
-- RESULT SHOULD BE 'Y'

select is_deleted from fhirdata.patient_logical_resources where logical_id = 'patient-3'
-- RESULT SHOULD BE 'N'

select is_deleted from fhirdata.patient_logical_resources where logical_id = 'patient-4'
-- RESULT SHOULD BE 'Y'

@lmsurpre
Copy link
Member

lmsurpre commented Mar 4, 2021

On our test database with ~10M resources, it took about 100 seconds to add the IS_DELETED columns and populate them from the DB.
I confirmed the result by popping open pgAdmin4 after a migration and looking at the first and last 100 rows of the CONDITION_RESOURCES table. I believe @michaelwschroeder will be doing further verification of the specific values.

@d0roppe
Copy link
Collaborator

d0roppe commented Mar 4, 2021

Verified migration from 4.5.4 to latest is working according to the above acceptance Criteria.

@michaelwschroeder
Copy link
Contributor

I have created/deleted resources via postman in my local derby database and have verified through the ij tool that the is_deleted flag is being set properly.

@punktilious
Copy link
Collaborator Author

There's an issue with the migration implementation for Derby which doesn't support correlated update statements. The fix is to select the current values for each resource and apply individual updates using JDBC batches (in class InitializeLogicalIdIsDeleted).

@d0roppe
Copy link
Collaborator

d0roppe commented Mar 5, 2021

verified 4.5.4 to latest is also working in Derby.

@d0roppe d0roppe closed this as completed Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request persistence schema-change a schema change
Projects
None yet
Development

No branches or pull requests

5 participants