Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate values for one resource in xx_resource_token_refs #3763

Closed
punktilious opened this issue Jul 7, 2022 · 4 comments
Closed

Duplicate values for one resource in xx_resource_token_refs #3763

punktilious opened this issue Jul 7, 2022 · 4 comments
Assignees
Labels
bug Something isn't working P1 Priority 1 - Must Have

Comments

@punktilious
Copy link
Collaborator

Describe the bug
The xx_resource_token_refs table contains duplicate entries. Although this doesn't impact the results of FHIR search interactions, it does lead to increased storage costs:

fhirdb=> select parameter_name_id, common_token_value_id, logical_resource_id, composite_id from fhirdata.patient_resource_token_refs where
parameter_name_id = 20423 and logical_resource_id = 458590;
 parameter_name_id | common_token_value_id | logical_resource_id | composite_id 
-------------------+-----------------------+---------------------+--------------
             20423 |                   886 |              458590 |             
             20423 |                   886 |              458590 |             

The same is true for xx_str_values.

Parameter tables are heavily indexed, so having duplicates here is significant.

Environment
Which version of IBM FHIR Server? 5.0.0

To Reproduce

  1. Build a fresh schema
  2. Run the system integration test suite
  3. Query the database and look for parameter tables with duplicate entries

Expected behavior
Parameter tables should not have duplicates. Note, some parameter values may be stored more than once if there are composites involved, but a parameter value should not be stored more than once where composite_id is null.

Additional context
N/A

@punktilious punktilious added bug Something isn't working P1 Priority 1 - Must Have labels Jul 7, 2022
@punktilious
Copy link
Collaborator Author

Deduplicating the parameter values before we store them requires visiting all the ExtractedParameterValue values and converting them into values we intend to store in the database. This deduplication can't be done at the ExtractedParameterValue level because an extracted value may contain multiple values.

This process is already performed by the remote-index client, so I propose that we do some refactoring and replace the current parameter persistence code with the newer (hopefully faster) remote index implementation. This will allow all parameter values to be collected as part of the transaction and should lead to improved throughput.

@punktilious
Copy link
Collaborator Author

This refactor requires a new project is created. The new project will be shared by both fhir-persistence-jdbc and fhir-remote-index. This avoids polluting fhir-persistence-jdbc with Kafka stuff, and fhir-remote-index with all the dependencies required for fhir-persistence-jdbc.

punktilious added a commit that referenced this issue Jul 11, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
@punktilious
Copy link
Collaborator Author

The new search parameter persistence code in fhir-persistence-params supports both PostgreSQL and Derby. For now, Db2 will use the old mechanism.

punktilious added a commit that referenced this issue Jul 11, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 12, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 12, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 12, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 12, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 13, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 13, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 13, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 13, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
punktilious added a commit that referenced this issue Jul 14, 2022
Signed-off-by: Robin Arnold <robin.arnold@ibm.com>
@punktilious punktilious self-assigned this Jul 14, 2022
@PrasannaHegde1
Copy link
Collaborator

Tried running the system integration test suite and did not find any duplicate values for a resource in patient_resource_token_refs.
Also tried to create a Patient resource with duplicate token search parameter values in the JSON request and did not find any duplicate values for that resource in patient_resource_token_refs.
This is working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 Priority 1 - Must Have
Projects
None yet
Development

No branches or pull requests

2 participants