-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr: Try Soft Commit on Indexing #10547
Solr: Try Soft Commit on Indexing #10547
Conversation
One issue with this is tests - I see several failing - guessing that this might be because the changes add a small delay. That might be acceptable in practice, but might require adding delays in the tests. (Could also be a real problem/bug, not sure yet.) |
If I were to experiment with/benchmark these code changes, should I be using this branch, or solr-soft-commit2? |
This branch has just the autoSoftCommit related changes. The other branch currently has that and changes to avoid deleting file docs unless needed. I'm pretty sure the changes here are all good. I haven't done much testing of the delete changes yet. |
Though this branch doesn't have the adds to timing in the tests to make them work yet, so OK to benchmark, but not ready to merge. |
Could you please sync the branch with develop - thanks. |
@qqmyers is going to take a look at failing tests (thanks). |
Hi @qqmyers 👋🏼! I did some more testing and added a wait of 5 seconds before:
worked for me, no matter how long a wait is before or after:
Produce the same error. I can dig a bit more into what is happening in the test but adding a wait here worked for me but needs to be done before/with: |
One change that would make this test pass would be just swapping these 2 lines 117-118 since the sleep is done on 118 but from what I got yesterday from the comments is that the super user shouldn't have to wait? |
I think this is OK to go forward. For some reason, it appears that testDeepLinks test is significantly slower with this PR although the overall effect appears to be more than a factor of 2 in speed improvement when reindexing ~225 datasets/4400 files.. @jp-tosca reports some sensitivity to the order of two searches that exists in the current develop branch which might be part of the increase (since I've swapped the order to do the search that requires all indexing to complete before the one that would succeed without everything being complete.) (Has work been done to avoid repeat indexing of collections like @ErykKul did for datasets?). In any case, tests are now passing (except for the confirmEmail one fixed in a separate PR) so I think review/QA can proceed/I'll remove myself as an assignee. |
This reverts commit 317038a.
👋🏼 @qqmyers @pdurbin @landreev I was doing some reading on the soft/hard commit and particularly I was wondering if we need to do something additional, here is some of the documentation:
|
line 293 in the solrconfig file should be doing a hard commit at a maximum of 30 seconds after a given addition. |
But if something happens between now and those 30 seconds the document would be lost no? I was looking at this SO where they mention:
|
lines 266-269 in the solrconfig file turn on transaction logging which is what is required to not lose info during a crash (see comments starting at 253). At startup, the transaction log is played to update the indexes - basically doing a hard commit. Nominally, hard commits can be less frequent than 30 seconds, but the longer it is/the bigger the log, the more time required after a crash to restart. |
Thanks! That makes sense. FWIW for whoever tests this, I killed my instance before the indexing and the item was indexed just as the instance went back up! 👍🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 I reviewed the changes and left some questions about the changes but everything seems to be fine. All tests are passing after the changes and the last update from develop.
Something else to keep in mind is that the Solr data is just a copy of the data in PostgreSQL. So while it would be sad to lose some Solr transactions, we can always reindex, if necessary. |
I'm not sure if this is good or not. I'm running a test multiple times against this branch and 2 others, including develop. Time from dataset create to seeing it in a search: |
The intent of the PR is not to decrease the latency of any given indexing operation - it makes changes to reduce the overall cost of indexing while keeping latency 'acceptable'. Right now, the autoSoftCommit in solrConfig.xml is set to 1 second, so the maximum latency should be ~1 second - consistent with what your test shows. If that's not acceptable, the autoSoftCommit time can be reduced. (Same for the autoHardCommit - it can be changed from 30 seconds if the time to restart (which causes the server to process the n seconds since the last hard commit) is not acceptable.) The key test is probably the full reindex time on a significantly sized database. On a QDR test box with only thousands of files, the two parts of this PR made the full reindex more than twice as fast. |
Happy to see this PR merged! |
* try soft commit * keep softcommit short to avoid delays in visibility * add test delay for autosoft, make hardcommit 30s like auto setting * add 1-2 second delays in tests for softAutocomplete at 1s * more sleeps * more delays * remove commented out deletes * more commented out code to remove * add 1 sec on failing tests * add missing perm reindex * change waiting * fix index object and add null check for unit test * remove test-specific null check * reindex linking dv * general solr release note * more fixes * revert change - was correct * another sleepforsearch * test adding explicit reindexing * avoid other uses of cache in test that looks for exact counts * Adding longer max sleep, add count param to sleep method * Revert "add missing perm reindex" This reverts commit 317038a.
Just want to put it on record that re-indexing a copy of the IQSS prod. database, the result is essentially the same: the speedup is just a bit under 2X. |
If you can differentiate a full "reindexing" (aka "zero day" scenario, aka bulk updates) from regular incremental updates, then you could eschew the auto soft commits in the solrconfig.xml and instead have only the incremental update requests add commitWithin since only those should be expected to be found within a second. |
What this PR does / why we need it:
This is a quick test to use softindexing/commit within functionality to see if it improves Solr performance. If it works, we might want COMMIT_WITHIN to be configurable
Note that the PR contains code updates to use commitWithin and avoid explicit hard commits, and a solrconfig.xml update to use autoSoftCommit and increate the autoHardCommit time. To test both, you have to install DV and swap the solrconfig.xml file in and restart Solr. Nominally they could be tried separately.
Which issue(s) this PR closes:
Closes #
Special notes for your reviewer:
Suggestions on how to test this: There are two parts to this PR - one is a change to turn on autoSoftCommit and increase the length of the hard commit time, both in solrConfig.xml and the other is changes in the code to use add/delete with a COMMIT_WITHIN parameter and to not have a separate commit() statement for indexing. These could be tested separately. Nominally the only affect should be on performance, so the main testing should be to verify that each part independently speeds things up. The solrConfig.xml change can be deployed to an existing machine and tested with a solr restart. The code changes have to be deployed via the war.
There is one ~unrelated test change in this PR - for the CacheFactoryBeanTest. While testing, I observed a failure where the testAuthenticatedUserGettingRateLimited() failed at line 169 when the final count was 122 instead of 120. I'm guessing this was due to some other test running in parallel and contributing to the count. I added an @ResourceLock(value = "cache") on the test to stop this. I haven't seen a repeat since but it is clearly rare overall, so I'm not sure how to test for QA.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: