Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API methods for reporting and cleaning up inconsistencies between the database and Solr Index #7211

Merged
merged 16 commits into from
Sep 23, 2020

Conversation

ekraffmiller
Copy link
Contributor

@ekraffmiller ekraffmiller commented Aug 19, 2020

Updated existing API method for detecting orphan objects in the database, and in Solr. Added a new API method for clearing all orphans.
(#4225)

There was an existing Index API method, called status, which reported all the inconsistencies between the database and Solr Index. I updated this method to report the object ids, rather than the count of objects, and also implemented the "exist in Solr but not in the database" type of orphan.

Updated the code to use a Solr query cursor, to improve performance & memory management.

Closes #4225

I tested this in my local environment by creating a dataverse and dataset, shutting down solr, then deleting the data when solr was shutdown.

database, and in Solr. Added a new API method for clearing all orphans. 
(#4225)
@coveralls
Copy link

coveralls commented Aug 19, 2020

Coverage Status

Coverage decreased (-0.03%) to 19.456% when pulling d01ca83 on 4225-stale-solr-record into c7d63c1 on develop.

Copy link
Contributor

@djbrooke djbrooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ekraffmiller, I'll add this as a general comment and keep this in code review for others to look at the code itself, but can you please add some docs for the new endpoint (and the updated endpoint, if needed)? Let me know if I can help w/ this -- happy to.

@ekraffmiller
Copy link
Contributor Author

Ok, will update the docs, and look into adding testing for the IndexServiceBean (since it failed the coveralls check), even though we decided not to do an integration test.

@pdurbin pdurbin self-assigned this Sep 9, 2020
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments but the only thing I think we should definitely change is where the docs go. I think they should be in the Solr page next to related commands.

doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
doc/release-notes/4225-stale-solr-records.md Outdated Show resolved Hide resolved
try {
contentInSolrButNotDatabase = getContentInSolrButNotDatabase();
} catch (SearchException ex) {
permissionsInSolrButNotDatabase = getPermissionsInSolrButNotDatabase();
} catch (SearchException ex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} catch (SearchException ex) {
} catch (SearchException ex) {

src/test/java/edu/harvard/iq/dataverse/api/IndexIT.java Outdated Show resolved Hide resolved
@pdurbin pdurbin removed their assignment Sep 9, 2020
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as of abcc9d8

I haven't tried the code myself but in addition to testing basic functionality, I'd suggest running a "clear-orphans" on a production database to see how long it takes.

@kcondon kcondon self-assigned this Sep 11, 2020
@pdurbin
Copy link
Member

pdurbin commented Sep 11, 2020

@kcondon heads up the @ekraffmiller and I have been talking about the test failures on build 16 of this branch. Since build 15 passed and build 16 was simply merging the latest from develop, I don't think this branch adds any breakage. Here are the build 15 test results: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-7211/15/testReport/

I did note that one of the breakages in build 16, InReviewWorkflowIT.testCuratorSendsCommentsToAuthor, also appears in develop build 579: https://jenkins.dataverse.org/job/IQSS-dataverse-develop/579/testReport/

@kcondon
Copy link
Contributor

kcondon commented Sep 11, 2020

@ekraffmiller Passing back to review status feedback on longer running jobs.

@kcondon kcondon assigned ekraffmiller and unassigned kcondon Sep 11, 2020
@pdurbin pdurbin self-assigned this Sep 22, 2020
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async code is in place so we're ready for more QA. I also made a tiny tweak to the docs to make the curl command consistent with other commands on the page.

@pdurbin pdurbin removed their assignment Sep 22, 2020
@kcondon kcondon self-assigned this Sep 22, 2020
@kcondon kcondon merged commit ae57981 into develop Sep 23, 2020
@kcondon kcondon deleted the 4225-stale-solr-record branch September 23, 2020 14:35
@djbrooke djbrooke added this to the 5.1 milestone Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indexing: Create API endpoint(s) that allows an admin to detect/delete stale solr record
6 participants