Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amend history only when resource has changed #777

Closed
MM-Lehmann opened this issue Jul 26, 2022 · 3 comments
Closed

amend history only when resource has changed #777

MM-Lehmann opened this issue Jul 26, 2022 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@MM-Lehmann
Copy link

We're using blaze in a daily ETL full-load setup, i.e. every day we're uploading all of our ~50k data samples, letting blaze do the history management (in case of changed resource details). However, even when nothing has changed (99%+ of the data each day), a new history entry is created, filling the storage quite quickly. It feels like a bug or shortcoming that blaze does not recognize that nothing has changed and I would expect that no history is amended in this case.

@alexanderkiel alexanderkiel self-assigned this Aug 2, 2022
@alexanderkiel
Copy link
Member

Hi Martin, I had this discussion in the FHIR Chat in 2020: https://chat.fhir.org/#narrow/stream/179166-implementers/topic/History.20of.20Resource.20Update.20with.20Identical.20Content

In the discussion it was agreed that the server can choose to either introduce new versions or not. In general new versions were preferred for clinical use-cases but it was also agreed that ETL processes can be a problem.

I tested that HAPI doesn't introduce new versions if the content doesn't change.

In the end, I don't see any support in the FHIR specification for deduplicating versions created by non-incremental ETL processes. Doing such no-op Updates is especially bad for Blaze, because it's designed around keeping track of every change. Blaze even has an Event-Driven Architecture were every update will result in a storage increase. Blaze may support cutting the history at some time, but today every update or delete should be considered as storage costly as the creation of a new resource. So only business relevant updates/deletes should be done.

I will leave that issue open in order to discuss how it would be possible to do the deduplication in your ETL process or even build an A/B Blaze deployment were you import every day and switch the sides for queries.

@alexanderkiel alexanderkiel added the question Further information is requested label Aug 2, 2022
@MM-Lehmann
Copy link
Author

Thanks for the summary. It's really hard to find out in our setup, which resources have changed or were deleted. The only scenario I can think of, is to download everything from blaze and compare each resource, effectively only uploading the differences. I hope upload and download don't have different structures, but I guess I will have to try this out some time.
Right now, we're still resetting the volume when it's getting too big (see #399).
Any chance for a proper warning from blaze instead of silent failure in this case?

@alexanderkiel
Copy link
Member

Any chance for a proper warning from blaze instead of silent failure in this case?

I would recommend to monitor your server using something like Prometheus and Node Exporter.

alexanderkiel added a commit that referenced this issue Jun 25, 2023
This change introduces a new trasnaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

This is work in progress, because two things are missing: the detection
of no chnages inside the transation for normal "put" commands and the
retry if "keep" will fail.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 26, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

This is work in progress, because two things are missing: the detection
of no changes inside the transaction for normal "put" commands and the
retry if "keep" will fail.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 26, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

This is work in progress, because two things are missing: the detection
of no changes inside the transaction for normal "put" commands and the
retry if "keep" will fail.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 29, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 29, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 29, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 29, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 29, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

Closes: #777
alexanderkiel added a commit that referenced this issue Jun 29, 2023
This change introduces a new transaction command called "keep" that just
means: "Keep the resource of it is not changed in between." Unlike
"put", "keep" will not introduce a new history entry. The command "keep"
is always conditional because the detection of no changes will be done
outside of the transaction.

Closes: #777
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants