Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up deleted items from OpenSearch #721

Merged
merged 7 commits into from
Oct 21, 2024
Merged

Conversation

sfisher
Copy link
Contributor

@sfisher sfisher commented Aug 30, 2024

See #719

This adds a script to iterate through all records in OpenSearch and to remove items that do not exist in the SearchIdentifiers table.

In order to go through more than 10,000 results this uses cursor based navigation and the "scroll" command to get the next results.

A command to run it with output to a log is

python -u manage.py opensearch-delete 2>&1 > del_opensearch.log

The SQL is a bit tricky because of some features that MySQL doesn't seem to have, but it does some UNION to put 100 IDs into a memory table, LEFT JOINs it with searchidentifier and finds rows where search identifier does not exist (but the opensearch ID does). I believe this is one of the most efficient ways since MySQL does joins on indexed fields very quickly.

Added some retries to scrolling the cursor since I tried it for a while and got a timeout from OpenSearch or connection error one of the times.

@sfisher sfisher changed the base branch from main to develop August 30, 2024 18:07
Copy link
Contributor

@jsjiang jsjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Scott for explaining the code to me.

Jing

@sfisher sfisher merged commit 0d07eb3 into develop Oct 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants