-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr 8.8 upgrade - remaining issues with solrconfig.xml #7662
Comments
Triggering @qqmyers @mheppler @scolapasta @pdurbin @sekmiller here. |
Noted, @poikilotherm. Thank you for catching this, opening an issue and providing all the details. Was already coordinating with @qqmyers and @scolapasta on #7378, and I'll add this new issue to my agenda as well. Might be worth scheduling another tech hour discussion tomorrow, if there are any questions. |
I also looked around for upstream changes to |
@poikilotherm first of all, thanks for creating this issue.
This sounds good but I'm not sure how it would work technically. As a starting point, it probably makes sense to list "our changes" so that we're all on the same page. We know we want "boosting" for example (see #1928 (comment) ) but I'm sure there are other tweaks we've made that I'm not thinking of. My guess is that we make fewer than half a dozen changes to the Solr config. Perhaps we should start by listing them in the dev guide so that when we do upgrades developers are aware of them. |
…tatic solrconfig.xml IQSS#7662
…Dataverse specific changes IQSS#7662
Simple Makefile to download Solr, extract the default configset and create a Dataverse flavored one. - Uses Maven to find the Solr distribution version to download. - Uses xsltproc to apply our XSLT transformations to sorlconfig.xml - Replaces the managed-schema with the static one we provide - Zips the configset to make it distributable as artifact
Simple Makefile to download Solr, extract the default configset and create a Dataverse flavored one. - Uses Maven to find the Solr distribution version to download. - Uses xsltproc to apply our XSLT transformations to sorlconfig.xml - Replaces the managed-schema with the static one we provide - Zips the configset to make it distributable as artifact
…tatic solrconfig.xml IQSS#7662
…Dataverse specific changes IQSS#7662
…QSS#7662 Instead of relying on Java provided exceptions, we want to track line numbers and other more details of the parsing process, so we need custom mechanics.
Our custom metadata block TSV files follow a certain order of things. We also do not allow for repetitions or similar. All of this can be most easily be depicted with a state maschine, so we know where to send a line to for parsing. This commit also adds the very basic (empty) POJOs to store the block, fields and vocabularies in to enable testing the state transition. It also adds constants we rely on, like what's the trigger char, the comment intro and the field delimiter
The TSV parser needs to verify if a certain line is a header line and matching the spec. To avoid duplicated validation code, this validator can be used with an arbitrary list of strings (so it can be reused for blocks, fields and vocabularies). As we will need to validate URLs in certain fields, this validator also offers a helper function to create predicates checking for valid URLs.
The Block POJO now contains the header specification (uses the Validator class to perform the validation) and allows to parse a line into a List. A later relaxation of the spec allowing for reordering of fields, etc is possible, while the calling code of the parser can reuse the found header definition. A builder pattern is used to parse and validate the actual definition. As the block may only be used once the definition, all fields and vocabularies have been parsed (if the is an error within the TSV the parsing has to fail!), the builder pattern is a natural match to that.
The Block POJO now contains the header specification (uses the Validator class to perform the validation) and allows to parse a line into a List. A later relaxation of the spec allowing for reordering of fields, etc is possible, while the calling code of the parser can reuse the found header definition. A builder pattern is used to parse and validate the actual definition. As the block may only be used once the definition, all fields and vocabularies have been parsed (if the is an error within the TSV the parsing has to fail!), the builder pattern is a natural match to that.
This simple class will allow to make the parser somewhat configurable, so future changes and command line options can be integrated more easily.
Instead of defining a static trigger, we want to be able to configure the trigger sign. Due to this, we use the keyword only and move the trigger handling into the ParsingState (which is analysing the line for state transition anyway).
- Implement first details of the Block POJO - Change parsing with BlockBuilder to use an internal state with a not-exposed Block object - The BlockBuilder may manipulate the Block, but after calling build() the calling code will have no option to edit the POJO (proper capsulation and sealing)
Add field types and make them usable as predicates for fields. Add test.
Predicates are not null safe - need to make validate() check for null
Includes all the predicates according to spec and test for them.
luceneMatchVersion update should be the only real change.
luceneMatchVersion update should be the only real change.
luceneMatchVersion update should be the only real change.
After upgrading from SOLR 8.2 to SOLR 8.8.2 LTR response has degraded by 35%. Please find LTR specific details. QUERY_DOC_FVLet me know is there anything needs. to added |
@gksachin04 in PR #8415 we already upgraded to Solr 8.11. If you're still having a problem with that version, can you please open a fresh issue? Thanks. And yes, more details would be great. 😄 |
@pdurbin Thanks, do you have LTR specific configuration in Solr 8.11 ? |
@gksachin04 sorry, I don't. You might want to ask the community about it: https://groups.google.com/g/dataverse-community |
Mistake
Since we upgraded from Solr 7.3.0, we made one bad mistake (mea culpa, too): we did not adapt the
luceneMatchVersion
to the version of the running server.Other changes
We also did not incorporate upstream changes to
solrconfig.xml
:The formerly present JARs have been excluded since 8.0, see apache/lucene-solr@dce36c1
I don't know if we actually use any of those. Remove and look if it breaks.
These are newer changes we should incorporate.
These are definitly changes we did. I don't know why they happened (it's really tricky to find its sources) and I don't know if this is actually used.
More changes by upstream, should be incorporated. (Seems related to the same change in apache/lucene-solr@dce36c1)
🚨 THIS IS CRUCIAL FOR US. Newer versions of Solr default to the managed schema factory that @pkiraly suggested in #5989.
These have been changed by upstream and as they seem to use regexes now, should be OK to incorporate.
Is the removal of this processors still a thing?
We should us the setting to disable this instead of changing the default... 🙈
More upstream due to the libs removed. Looks like we never configured those.
Conclusion
Instead of maintaining a static config, we should rely on using the
_default
configset and apply our changes to it.At least this is what I'm going to do in the Dataverse Solr container images.
The text was updated successfully, but these errors were encountered: