-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6633 update solr 772 #6631
6633 update solr 772 #6631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell all instances of 7.3.1 have been replaced with 7.7.2. But like @poikilotherm said, a release note needs to be added.
How frequent are these warnings? Only on startup? Or are they emitted constantly and there's a risk of filling up a disk? |
I only see them at startup, when config is parsed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! (I haven't tested anything.) Thanks, @poikilotherm !
Quick q: has anyone in Dev reviewed the Solr release notes to see whether there are any obvious points of concern prior to testing? |
@kcondon I haven't. @poikilotherm , have you ? |
@pdurbin I guess it was really a request if it hasn't been done. Thanks! |
I have. I couldn't find anything relevant for us, but this should be verified in-depth by someone else. 4 eyes see more than 2. |
I just looked at the "upgrade notes" at https://lucene.apache.org/solr/7_7_2/changes/Changes.html for all the versions (screenshot below) and I don't see anything I'm concerned about. We don't use fancy features in Solr. I don't think any of these upgrade notes pertain to our use of Solr. |
@poikilotherm @pdurbin Thanks for checking! |
I recompiled v4.19 with @poikilotherm's change to pom.xml, deployed it to https://payara5.odum.unc.edu and upgraded Solr to 7.7.2 on that host, FWIW. Only entry in server.log so far:
|
I see the same error message with current SolrJ 7.3.1, so no blocker 😌 |
@poikilotherm Quick q: would it not be simpler to just install the new version rather than upgrade? |
That might depend on how much time it takes to reindex and how much downtime is ok for your installation. I have no experience with big installations with 100.000+ datasets. Maybe @4thikonov can share experience about his 50.000 dataset import to give us an idea? On second thought: as long as we don't touch the schema, a reindex shouldn't be necessary. Also, reading the upgrade docs again, the process is all about installing the new version, move your collection in place and start Solr. |
@poikilotherm Thanks. It takes ~18hrs to fully reindex Harvard. What I was trying to figure out is whether:
|
Ad 1) I have no idea if SolrJ 7.7 is compatible with Solr 7.3. I even have no idea where to look for such a statement. IMHO it might be better to upgrade, as our old SolrJ version has a CVE. We could also test this, but upgrading is always better IMHO... Ad 2) We are only doing minor upgrades. Index is compatible from what I see in the release docs and upgrade notes. So no need to reindex as long as we don't change the schema config or a reindex becomes necessary due to changes in a metadata block like citation.tsv. If you're asking the paranoid me, I'd do an update test with sample data... Ad 3) I'm pretty sure that neither Dataverse nor Solr supports partial reindexes. An inplace reindex should be fine, no need to drop and rebuild. That should be much faster too, shouldn't it? |
@poikilotherm heads up that I tweaked your release note at b1fb21d. @kcondon and I worked on it together. |
Well, you can reindex individual datasets, for example: http://guides.dataverse.org/en/4.19/admin/solr-search-index.html#reindexing-datasets |
Thank you everyone involved for a quick solution and all your efforts you spent so eager and willingly! Very much appreciated! |
@poikilotherm @pdurbin After some additional testing, post merge, I'm finding some strange behavior: |
You are testing with full text indexing turned on, right? If so, could you try again with fulltext indexing turned off and see if performance is back again? Any chance we can create a representative test data set so this could be part of a load test in CI? If you have some time in spare: any chance to test this with current dataverse-sample-data to check if this happens with it, too? Reproducible problems are so much easier to debug... 😉 |
@poikilotherm I am not testing with full text indexing and I am using the same db for both current and new solr test scenarios. Ok, using the process of elimination, I deployed your war file using the existing production solr version that had been performing well and now see the same performance issues. So it appears to be something in the pr? Apologies for not catching it sooner -it passed basic functional testing and I had some config issues with my prod/volume test but saw this as lower risk, based on the small code change. Last update: I am able to use the production 4.19 war against solr 7.7.2 and it works as expected in terms of performance and logging. I will try a build from develop just in case I have a bad build for some reason. Still working on the develop approach -hit a snag with another pr. Will ask @djbrooke for input. We do not have any logging to speak of for this problem so maybe some logging might help. |
@poikilotherm @pdurbin So it looks like it is not the solrj client version, nor solr server but something in the develop branch related to indexing a particular dataverse we have in production, Murray Research Archive. I'm not sure yet what might be the problem but 1. the dv fails to index with no error, 2. batch index is impacted in a weird way, still partly functions, 3. logging is impacted and does not help with understanding the problem. I think 2 and 3 are preexisting and 1 might be related to dv metadata changes in this branch? Effectively I can reproduce part of the behavior by trying to index the last attempted dv in the logs: I've created a separate issue for this: #6665 I do not think it has to do with this pr at this point until we learn more. |
I agree. Judging from investigation by @sekmiller the problems mentioned above seem to have been introduced in pull request #6564. #6665 is the issue to watch for updates. |
What this PR does / why we need it:
We should update to the latest supported version of Solr in the 7.x release train to be on an upstream supported release.
According to this list, there have been no minor or major security problems.
Which issue(s) this PR closes:
Closes #6633
Special notes for your reviewer:
2020-02-10: this is a WIP
Basically, I just did a grep and sed to replace any "7.3.1" with "7.7.2":
Do we need to add more notes, add docs, ...?
We should discuss about the deprecation warnings (see #6599) and maybe get rid of 'em to prepare for the future. If we change that, we will need to check about reindexing advice for RLN.
Suggestions on how to test this:
We still need to figure out how to test all of this.
Components affected:
Does this PR introduce a user interface change?:
Nope. Nada.
Is there a release notes update needed for this change?:
Definitely. Still needs to be take care of. A first draft is included in the commits.
Additional documentation:
None so far.