Clean up deleted items from OpenSearch #721
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See #719
This adds a script to iterate through all records in OpenSearch and to remove items that do not exist in the SearchIdentifiers table.
In order to go through more than 10,000 results this uses cursor based navigation and the "scroll" command to get the next results.
A command to run it with output to a log is
The SQL is a bit tricky because of some features that MySQL doesn't seem to have, but it does some UNION to put 100 IDs into a memory table, LEFT JOINs it with searchidentifier and finds rows where search identifier does not exist (but the opensearch ID does). I believe this is one of the most efficient ways since MySQL does joins on indexed fields very quickly.
Added some retries to scrolling the cursor since I tried it for a while and got a timeout from OpenSearch or connection error one of the times.