-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API methods for reporting and cleaning up inconsistencies between the database and Solr Index #7211
Conversation
database, and in Solr. Added a new API method for clearing all orphans. (#4225)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ekraffmiller, I'll add this as a general comment and keep this in code review for others to look at the code itself, but can you please add some docs for the new endpoint (and the updated endpoint, if needed)? Let me know if I can help w/ this -- happy to.
Ok, will update the docs, and look into adding testing for the IndexServiceBean (since it failed the coveralls check), even though we decided not to do an integration test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments but the only thing I think we should definitely change is where the docs go. I think they should be in the Solr page next to related commands.
try { | ||
contentInSolrButNotDatabase = getContentInSolrButNotDatabase(); | ||
} catch (SearchException ex) { | ||
permissionsInSolrButNotDatabase = getPermissionsInSolrButNotDatabase(); | ||
} catch (SearchException ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} catch (SearchException ex) { | |
} catch (SearchException ex) { |
src/main/java/edu/harvard/iq/dataverse/search/SolrIndexServiceBean.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/search/SolrIndexServiceBean.java
Outdated
Show resolved
Hide resolved
…to 4225-stale-solr-record # Conflicts: # doc/release-notes/4225-stale-solr-records.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as of abcc9d8
I haven't tried the code myself but in addition to testing basic functionality, I'd suggest running a "clear-orphans" on a production database to see how long it takes.
@kcondon heads up the @ekraffmiller and I have been talking about the test failures on build 16 of this branch. Since build 15 passed and build 16 was simply merging the latest from develop, I don't think this branch adds any breakage. Here are the build 15 test results: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-7211/15/testReport/ I did note that one of the breakages in build 16, InReviewWorkflowIT.testCuratorSendsCommentsToAuthor, also appears in develop build 579: https://jenkins.dataverse.org/job/IQSS-dataverse-develop/579/testReport/ |
@ekraffmiller Passing back to review status feedback on longer running jobs. |
efficient. Have the API methods return immediately, and report progress and method result in server.log.
…to 4225-stale-solr-record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The async code is in place so we're ready for more QA. I also made a tiny tweak to the docs to make the curl command consistent with other commands on the page.
Updated existing API method for detecting orphan objects in the database, and in Solr. Added a new API method for clearing all orphans.
(#4225)
There was an existing Index API method, called status, which reported all the inconsistencies between the database and Solr Index. I updated this method to report the object ids, rather than the count of objects, and also implemented the "exist in Solr but not in the database" type of orphan.
Updated the code to use a Solr query cursor, to improve performance & memory management.
Closes #4225
I tested this in my local environment by creating a dataverse and dataset, shutting down solr, then deleting the data when solr was shutdown.