Solr: Try Soft Commit on Indexing #10547

qqmyers · 2024-05-08T17:08:59Z

What this PR does / why we need it:
This is a quick test to use softindexing/commit within functionality to see if it improves Solr performance. If it works, we might want COMMIT_WITHIN to be configurable

Note that the PR contains code updates to use commitWithin and avoid explicit hard commits, and a solrconfig.xml update to use autoSoftCommit and increate the autoHardCommit time. To test both, you have to install DV and swap the solrconfig.xml file in and restart Solr. Nominally they could be tried separately.

Which issue(s) this PR closes:

Closes #

Special notes for your reviewer:

Suggestions on how to test this: There are two parts to this PR - one is a change to turn on autoSoftCommit and increase the length of the hard commit time, both in solrConfig.xml and the other is changes in the code to use add/delete with a COMMIT_WITHIN parameter and to not have a separate commit() statement for indexing. These could be tested separately. Nominally the only affect should be on performance, so the main testing should be to verify that each part independently speeds things up. The solrConfig.xml change can be deployed to an existing machine and tested with a solr restart. The code changes have to be deployed via the war.

There is one ~unrelated test change in this PR - for the CacheFactoryBeanTest. While testing, I observed a failure where the testAuthenticatedUserGettingRateLimited() failed at line 169 when the final count was 122 instead of 120. I'm guessing this was due to some other test running in parallel and contributing to the count. I added an @ResourceLock(value = "cache") on the test to stop this. I haven't seen a repeat since but it is clearly rare overall, so I'm not sure how to test for QA.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

qqmyers · 2024-05-08T20:07:39Z

One issue with this is tests - I see several failing - guessing that this might be because the changes add a small delay. That might be acceptable in practice, but might require adding delays in the tests. (Could also be a real problem/bug, not sure yet.)

landreev · 2024-05-20T15:54:17Z

If I were to experiment with/benchmark these code changes, should I be using this branch, or solr-soft-commit2?

qqmyers · 2024-05-20T16:12:22Z

This branch has just the autoSoftCommit related changes. The other branch currently has that and changes to avoid deleting file docs unless needed. I'm pretty sure the changes here are all good. I haven't done much testing of the delete changes yet.

qqmyers · 2024-05-20T16:13:32Z

Though this branch doesn't have the adds to timing in the tests to make them work yet, so OK to benchmark, but not ready to merge.

landreev · 2024-05-20T18:44:58Z

Could you please sync the branch with develop - thanks.

coveralls · 2024-05-20T19:23:43Z

coverage: 20.585% (+0.005%) from 20.58%
when pulling 1c6fd45 on GlobalDataverseCommunityConsortium:solr-soft-commit
into e6b5856 on IQSS:develop.

pdurbin · 2024-05-22T19:05:36Z

@qqmyers is going to take a look at failing tests (thanks).

jp-tosca · 2024-05-28T22:55:23Z

Hi @qqmyers 👋🏼!

I did some more testing and added a wait of 5 seconds before:

dataverse/src/test/java/edu/harvard/iq/dataverse/api/SearchIT.java

Line 917 in 1f9a682

    
           Response searchPublishedSubtreeSuper = UtilIT.search(searchPart, apiTokenSuper, "&subtree="+parentDataverseAlias);

worked for me, no matter how long a wait is before or after:

dataverse/src/test/java/edu/harvard/iq/dataverse/api/SearchIT.java

Line 918 in 1f9a682

    
           assertTrue(UtilIT.sleepForSearch(searchPart, apiToken, "&subtree="+parentDataverseAlias, UtilIT.MAXIMUM_INGEST_LOCK_DURATION), "Failed test if search exceeds max duration " + searchPart);

Produce the same error.

I can dig a bit more into what is happening in the test but adding a wait here worked for me but needs to be done before/with: searchPublishedSubtreeSuper

jp-tosca · 2024-05-29T00:19:40Z

One change that would make this test pass would be just swapping these 2 lines 117-118 since the sleep is done on 118 but from what I got yesterday from the comments is that the super user shouldn't have to wait?

qqmyers · 2024-05-30T13:55:15Z

I think this is OK to go forward. For some reason, it appears that testDeepLinks test is significantly slower with this PR although the overall effect appears to be more than a factor of 2 in speed improvement when reindexing ~225 datasets/4400 files.. @jp-tosca reports some sensitivity to the order of two searches that exists in the current develop branch which might be part of the increase (since I've swapped the order to do the search that requires all indexing to complete before the one that would succeed without everything being complete.) (Has work been done to avoid repeat indexing of collections like @ErykKul did for datasets?).

In any case, tests are now passing (except for the confirmEmail one fixed in a separate PR) so I think review/QA can proceed/I'll remove myself as an assignee.

This reverts commit 317038a.

jp-tosca · 2024-06-04T14:12:38Z

👋🏼 @qqmyers @pdurbin @landreev

I was doing some reading on the soft/hard commit and particularly I was wondering if we need to do something additional, here is some of the documentation:

A common configuration might be to 'hard' auto-commit every 1-10 minutes and 'soft' auto-commit every second.
With this configuration, new documents will show up within about a second of being added, and if the power goes out, you will be certain to have a consistent index up to the last 'hard' commit.

qqmyers · 2024-06-04T14:15:21Z

line 293 in the solrconfig file should be doing a hard commit at a maximum of 30 seconds after a given addition.

jp-tosca · 2024-06-04T14:18:58Z

But if something happens between now and those 30 seconds the document would be lost no? I was looking at this SO where they mention:

the commitWithin is a soft-commit by default. Soft-commits are very efficient in terms of making the added documents immediately searchable. But! They are not on the disk yet. That means the documents are being committed into RAM. In this setup you would use updateLog to be solr instance crash tolerant.

qqmyers · 2024-06-04T14:25:59Z

lines 266-269 in the solrconfig file turn on transaction logging which is what is required to not lose info during a crash (see comments starting at 253). At startup, the transaction log is played to update the indexes - basically doing a hard commit. Nominally, hard commits can be less frequent than 30 seconds, but the longer it is/the bigger the log, the more time required after a crash to restart.

jp-tosca · 2024-06-04T14:28:30Z

Thanks! That makes sense. FWIW for whoever tests this, I killed my instance before the indexing and the item was indexed just as the instance went back up! 👍🏼

jp-tosca

👍🏼 I reviewed the changes and left some questions about the changes but everything seems to be fine. All tests are passing after the changes and the last update from develop.

pdurbin · 2024-06-04T20:37:30Z

Something else to keep in mind is that the Solr data is just a copy of the data in PostgreSQL. So while it would be sad to lose some Solr transactions, we can always reindex, if necessary.

stevenwinship · 2024-06-06T15:48:11Z

I'm not sure if this is good or not. I'm running a test multiple times against this branch and 2 others, including develop.

Time from dataset create to seeing it in a search:
develop - 500ms
this branch - 1,160ms

qqmyers · 2024-06-10T14:16:04Z

The intent of the PR is not to decrease the latency of any given indexing operation - it makes changes to reduce the overall cost of indexing while keeping latency 'acceptable'. Right now, the autoSoftCommit in solrConfig.xml is set to 1 second, so the maximum latency should be ~1 second - consistent with what your test shows. If that's not acceptable, the autoSoftCommit time can be reduced. (Same for the autoHardCommit - it can be changed from 30 seconds if the time to restart (which causes the server to process the n seconds since the last hard commit) is not acceptable.)

The key test is probably the full reindex time on a significantly sized database. On a QDR test box with only thousands of files, the two parts of this PR made the full reindex more than twice as fast.

landreev · 2024-06-10T14:59:54Z

Happy to see this PR merged!
🎉

* try soft commit * keep softcommit short to avoid delays in visibility * add test delay for autosoft, make hardcommit 30s like auto setting * add 1-2 second delays in tests for softAutocomplete at 1s * more sleeps * more delays * remove commented out deletes * more commented out code to remove * add 1 sec on failing tests * add missing perm reindex * change waiting * fix index object and add null check for unit test * remove test-specific null check * reindex linking dv * general solr release note * more fixes * revert change - was correct * another sleepforsearch * test adding explicit reindexing * avoid other uses of cache in test that looks for exact counts * Adding longer max sleep, add count param to sleep method * Revert "add missing perm reindex" This reverts commit 317038a.

landreev · 2024-06-18T18:07:14Z

The key test is probably the full reindex time on a significantly sized database. On a QDR test box with only thousands of files, the two parts of this PR made the full reindex more than twice as fast.

Just want to put it on record that re-indexing a copy of the IQSS prod. database, the result is essentially the same: the speedup is just a bit under 2X.

dsmiley · 2024-08-28T02:34:42Z

If you can differentiate a full "reindexing" (aka "zero day" scenario, aka bulk updates) from regular incremental updates, then you could eschew the auto soft commits in the solrconfig.xml and instead have only the incremental update requests add commitWithin since only those should be expected to be found within a second.

try soft commit

6a333e1

qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label May 8, 2024

scolapasta assigned qqmyers May 9, 2024

Merge remote-tracking branch 'IQSS/develop' into solr-soft-commit-orig

fb322a9

qqmyers added 7 commits May 22, 2024 12:27

keep softcommit short to avoid delays in visibility

1e5b167

add test delay for autosoft, make hardcommit 30s like auto setting

40d85de

add 1-2 second delays in tests for softAutocomplete at 1s

d25daa7

more sleeps

3b2657f

more delays

2d2aacc

remove commented out deletes

d64e973

more commented out code to remove

139c833

qqmyers marked this pull request as ready for review May 22, 2024 16:44

qqmyers added this to the 6.3 milestone May 22, 2024

qqmyers removed their assignment May 22, 2024

pdurbin assigned pdurbin and qqmyers May 22, 2024

qqmyers added 7 commits May 22, 2024 15:34

add 1 sec on failing tests

b0f8223

add missing perm reindex

317038a

change waiting

51b89b6

fix index object and add null check for unit test

94cb9b0

remove test-specific null check

fcaa51e

reindex linking dv

d9ccfbd

general solr release note

608444f

landreev mentioned this pull request May 29, 2024

Investigate Solr performance issues (again) #10469

Closed

pdurbin assigned jp-tosca and unassigned pdurbin May 29, 2024

Adding longer max sleep, add count param to sleep method

7f898d7

Revert "add missing perm reindex"

2ff37bb

This reverts commit 317038a.

qqmyers removed their assignment May 30, 2024

qqmyers added 2 commits May 30, 2024 12:03

Merge remote-tracking branch 'IQSS/develop' into solr-soft-commit-orig

0579fed

Merge remote-tracking branch 'IQSS/develop' into solr-soft-commit-orig

1c6fd45

jp-tosca approved these changes Jun 4, 2024

View reviewed changes

jp-tosca removed their assignment Jun 4, 2024

stevenwinship self-assigned this Jun 6, 2024

stevenwinship merged commit 5bf6b6d into IQSS:develop Jun 10, 2024
15 checks passed

stevenwinship removed their assignment Jun 10, 2024

landreev mentioned this pull request Jun 25, 2024

Solr: don't delete docs that will just change #10579

Merged

qqmyers mentioned this pull request Jun 25, 2024

IQSS/10559-2 Drop COMMIT_WITHIN which breaks autoSoftCommit by maxTime in solrconfig #10654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solr: Try Soft Commit on Indexing #10547

Solr: Try Soft Commit on Indexing #10547

qqmyers commented May 8, 2024 •

edited

Loading

qqmyers commented May 8, 2024

landreev commented May 20, 2024

qqmyers commented May 20, 2024

qqmyers commented May 20, 2024

landreev commented May 20, 2024

coveralls commented May 20, 2024 •

edited

Loading

pdurbin commented May 22, 2024

jp-tosca commented May 28, 2024

jp-tosca commented May 29, 2024

qqmyers commented May 30, 2024

jp-tosca commented Jun 4, 2024

qqmyers commented Jun 4, 2024

jp-tosca commented Jun 4, 2024

qqmyers commented Jun 4, 2024

jp-tosca commented Jun 4, 2024

jp-tosca left a comment

pdurbin commented Jun 4, 2024

stevenwinship commented Jun 6, 2024 •

edited

Loading

qqmyers commented Jun 10, 2024

landreev commented Jun 10, 2024

landreev commented Jun 18, 2024

dsmiley commented Aug 28, 2024

Solr: Try Soft Commit on Indexing #10547

Solr: Try Soft Commit on Indexing #10547

Conversation

qqmyers commented May 8, 2024 • edited Loading

qqmyers commented May 8, 2024

landreev commented May 20, 2024

qqmyers commented May 20, 2024

qqmyers commented May 20, 2024

landreev commented May 20, 2024

coveralls commented May 20, 2024 • edited Loading

pdurbin commented May 22, 2024

jp-tosca commented May 28, 2024

jp-tosca commented May 29, 2024

qqmyers commented May 30, 2024

jp-tosca commented Jun 4, 2024

qqmyers commented Jun 4, 2024

jp-tosca commented Jun 4, 2024

qqmyers commented Jun 4, 2024

jp-tosca commented Jun 4, 2024

jp-tosca left a comment

Choose a reason for hiding this comment

pdurbin commented Jun 4, 2024

stevenwinship commented Jun 6, 2024 • edited Loading

qqmyers commented Jun 10, 2024

landreev commented Jun 10, 2024

landreev commented Jun 18, 2024

dsmiley commented Aug 28, 2024

qqmyers commented May 8, 2024 •

edited

Loading

coveralls commented May 20, 2024 •

edited

Loading

stevenwinship commented Jun 6, 2024 •

edited

Loading