Solr 8.8 upgrade - remaining issues with solrconfig.xml #7662

poikilotherm · 2021-03-08T13:27:17Z

Mistake

Since we upgraded from Solr 7.3.0, we made one bad mistake (mea culpa, too): we did not adapt the luceneMatchVersion to the version of the running server.

Other changes

We also did not incorporate upstream changes to solrconfig.xml:

--- solrconfig.xml	2021-03-08 10:29:37.810488567 +0100
+++ solrconfig-881.xml	2021-02-12 19:56:43.000000000 +0100
@@ -35,7 +35,7 @@
        that you fully re-index after changing this setting as it can
        affect both how text is indexed and queried.
   -->
-  <luceneMatchVersion>7.3.0</luceneMatchVersion>
+  <luceneMatchVersion>8.8.1</luceneMatchVersion>
 
   <!-- <lib/> directives can be used to instruct Solr to load any Jars
        identified and use them to resolve any "plugins" specified in
@@ -69,20 +69,11 @@
        If a 'dir' option (with or without a regex) is used and nothing
        is found that matches, a warning will be logged.

The formerly present JARs have been excluded since 8.0, see apache/lucene-solr@dce36c1

I don't know if we actually use any of those. Remove and look if it breaks.

-       The examples below can be used to load some solr-contribs along
+       The example below can be used to load a solr-contrib along
        with their external dependencies.
     -->
-  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
-  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
+    <!-- <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-ltr-\d.*\.jar" /> -->
 
-  <lib dir="${solr.install.dir:../../../..}/contrib/clustering/lib/" regex=".*\.jar" />
-  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-clustering-\d.*\.jar" />
-
-  <lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar" />
-  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar" />
-
-  <lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar" />
-  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />
   <!-- an exact 'path' can be used instead of a 'dir' to specify a
        specific jar file.  This will cause a serious error to be logged
        if it can't be loaded.

These are newer changes we should incorporate.

@@ -161,6 +152,15 @@
     <!-- <ramBufferSizeMB>100</ramBufferSizeMB> -->
     <!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
 
+    <!-- Expert: ramPerThreadHardLimitMB sets the maximum amount of RAM that can be consumed
+         per thread before they are flushed. When limit is exceeded, this triggers a forced
+         flush even if ramBufferSizeMB has not been exceeded.
+         This is a safety limit to prevent Lucene's DocumentsWriterPerThread from address space
+         exhaustion due to its internal 32 bit signed integer based memory addressing.
+         The specified value should be greater than 0 and less than 2048MB. When not specified,
+         Solr uses Lucene's default value 1945. -->
+    <!-- <ramPerThreadHardLimitMB>1945</ramPerThreadHardLimitMB> -->
+
     <!-- Expert: Merge Policy
          The Merge Policy in Lucene controls how merging of segments is done.
          The default since Solr/Lucene 3.3 is TieredMergePolicy.
@@ -367,23 +367,32 @@
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
   <query>
 
-    <!-- Maximum number of clauses in each BooleanQuery,  an exception
-         is thrown if exceeded.  It is safe to increase or remove this setting,
-         since it is purely an arbitrary limit to try and catch user errors where
-         large boolean queries may not be the best implementation choice.
+    <!-- Maximum number of clauses allowed when parsing a boolean query string.
+         
+         This limit only impacts boolean queries specified by a user as part of a query string,
+         and provides per-collection controls on how complex user specified boolean queries can
+         be.  Query strings that specify more clauses then this will result in an error.
+         
+         If this per-collection limit is greater then the global `maxBooleanClauses` limit
+         specified in `solr.xml`, it will have no effect, as that setting also limits the size
+         of user specified boolean queries.
       -->
-    <maxBooleanClauses>1024</maxBooleanClauses>
+    <maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>
 
     <!-- Solr Internal Query Caches
 
-         There are two implementations of cache available for Solr,
-         LRUCache, based on a synchronized LinkedHashMap, and
-         FastLRUCache, based on a ConcurrentHashMap.
+         There are four implementations of cache available for Solr:
+         LRUCache, based on a synchronized LinkedHashMap, 
+         LFUCache and FastLRUCache, based on a ConcurrentHashMap, and CaffeineCache -
+         a modern and robust cache implementation. Note that in Solr 9.0
+         only CaffeineCache will be available, other implementations are now
+         deprecated.
 
          FastLRUCache has faster gets and slower puts in single
          threaded operation and thus is generally faster than LRUCache
          when the hit ratio of the cache is high (> 75%), and may be
          faster under other scenarios on multi-cpu systems.
+         Starting with Solr 9.0 the default cache implementation used is CaffeineCache.
     -->
 
     <!-- Filter Cache
@@ -403,13 +412,12 @@
            initialSize - the initial capacity (number of entries) of
                the cache.  (see java.util.HashMap)
            autowarmCount - the number of entries to prepopulate from
-               and old cache.
+               an old cache.
            maxRamMB - the maximum amount of RAM (in MB) that this cache is allowed
                       to occupy. Note that when this option is specified, the size
                       and initialSize parameters are ignored.
       -->
-    <filterCache class="solr.FastLRUCache"
-                 size="512"
+    <filterCache size="512"
                  initialSize="512"
                  autowarmCount="0"/>
 
@@ -421,8 +429,7 @@
             maxRamMB - the maximum amount of RAM (in MB) that this cache is allowed
                        to occupy
       -->
-    <queryResultCache class="solr.LRUCache"
-                      size="512"
+    <queryResultCache size="512"
                       initialSize="512"
                       autowarmCount="0"/>
 
@@ -432,14 +439,12 @@
          document).  Since Lucene internal document ids are transient,
          this cache will not be autowarmed.
       -->
-    <documentCache class="solr.LRUCache"
-                   size="512"
+    <documentCache size="512"
                    initialSize="512"
                    autowarmCount="0"/>
 
     <!-- custom cache currently used by block join -->
     <cache name="perSegFilter"
-           class="solr.search.LRUCache"
            size="10"
            initialSize="0"
            autowarmCount="10"
@@ -452,8 +457,7 @@
          even if not configured here.
       -->
     <!--
-       <fieldValueCache class="solr.FastLRUCache"
-                        size="512"
+       <fieldValueCache size="512"
                         autowarmCount="128"
                         showItems="32" />
       -->
@@ -469,7 +473,6 @@
       -->
     <!--
        <cache name="myUserCache"
-              class="solr.LRUCache"
               size="4096"
               initialSize="1024"
               autowarmCount="1024"
@@ -521,6 +524,23 @@
       -->
     <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
 
+  <!-- Use Filter For Sorted Query
+
+   A possible optimization that attempts to use a filter to
+   satisfy a search.  If the requested sort does not include
+   score, then the filterCache will be checked for a filter
+   matching the query. If found, the filter will be used as the
+   source of document ids, and then the sort will be applied to
+   that.
+
+   For most situations, this will not be useful unless you
+   frequently get the same search repeatedly with different sort
+   options, and none of them ever use "score"
+-->
+    <!--
+       <useFilterForSortedQuery>true</useFilterForSortedQuery>
+      -->
+
     <!-- Query Related Event Listeners
 
          Various IndexSearcher related events can trigger Listeners to
@@ -569,6 +589,64 @@
 
   </query>
 
+  <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+     Circuit Breaker Section - This section consists of configurations for
+     circuit breakers
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -->
+
+    <!-- Circuit Breakers
+
+     Circuit breakers are designed to allow stability and predictable query
+     execution. They prevent operations that can take down the node and cause
+     noisy neighbour issues.
+
+     This flag is the uber control switch which controls the activation/deactivation of all circuit
+     breakers. If a circuit breaker wishes to be independently configurable,
+     they are free to add their specific configuration but need to ensure that this flag is always
+     respected - this should have veto over all independent configuration flags.
+    -->
+    <circuitBreakers enabled="true">
+
+    <!-- Memory Circuit Breaker Configuration
+
+     Specific configuration for max JVM heap usage circuit breaker. This configuration defines whether
+     the circuit breaker is enabled and the threshold percentage of maximum heap allocated beyond which queries will be rejected until the
+     current JVM usage goes below the threshold. The valid value range for this value is 50-95.
+
+     Consider a scenario where the max heap allocated is 4 GB and memoryCircuitBreakerThreshold is
+     defined as 75. Threshold JVM usage will be 4 * 0.75 = 3 GB. Its generally a good idea to keep this value between 75 - 80% of maximum heap
+     allocated.
+
+     If, at any point, the current JVM heap usage goes above 3 GB, queries will be rejected until the heap usage goes below 3 GB again.
+     If you see queries getting rejected with 503 error code, check for "Circuit Breakers tripped"
+     in logs and the corresponding error message should tell you what transpired (if the failure
+     was caused by tripped circuit breakers).
+
+     If, at any point, the current JVM heap usage goes above 3 GB, queries will be rejected until the heap usage goes below 3 GB again.
+     If you see queries getting rejected with 503 error code, check for "Circuit Breakers tripped"
+     in logs and the corresponding error message should tell you what transpired (if the failure
+     was caused by tripped circuit breakers).
+    -->
+    <!--
+   <memBreaker enabled="true" threshold="75"/>
+    -->
+
+      <!-- CPU Circuit Breaker Configuration
+
+     Specific configuration for CPU utilization based circuit breaker. This configuration defines whether the circuit breaker is enabled
+     and the average load over the last minute at which the circuit breaker should start rejecting queries.
+
+     Consider a scenario where the max heap allocated is 4 GB and memoryCircuitBreakerThreshold is
+     defined as 75. Threshold JVM usage will be 4 * 0.75 = 3 GB. Its generally a good idea to keep this value between 75 - 80% of maximum heap
+     allocated.
+    -->
+
+      <!--
+       <cpuBreaker enabled="true" threshold="75"/>
+      -->
+
+  </circuitBreakers>
+
 
   <!-- Request Dispatcher

These are definitly changes we did. I don't know why they happened (it's really tricky to find its sources) and I don't know if this is actually used.

@@ -693,48 +771,6 @@
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
-      <str name="defType">edismax</str>
-      <float name="tie">0.075</float>
-        <str name="qf">
-            dvName^400
-            authorName^180
-            dvSubject^190
-            dvDescription^180
-            dvAffiliation^170
-            title^130
-            subject^120
-            keyword^110
-            topicClassValue^100
-            dsDescriptionValue^90
-            authorAffiliation^80
-            publicationCitation^60
-            producerName^50
-            fileName^30
-            fileDescription^30
-            variableLabel^20
-            variableName^10
-            _text_^1.0
-        </str>
-        <str name="pf">
-            dvName^200
-            authorName^100
-            dvSubject^100
-            dvDescription^100
-            dvAffiliation^100
-            title^75
-            subject^75
-            keyword^75
-            topicClassValue^75
-            dsDescriptionValue^75
-            authorAffiliation^75
-            publicationCitation^75
-            producerName^75
-        </str>
-        <!-- Even though this number is huge it only seems to apply a boost of ~1.5x to final result -MAD 4.9.3--> 
-        <str name="bq">
-            isHarvested:false^25000
-        </str>
-
       <!-- Default search field
          <str name="df">text</str> 
         -->
@@ -805,43 +841,12 @@
     </lst>
   </requestHandler>

More changes by upstream, should be incorporated. (Seems related to the same change in apache/lucene-solr@dce36c1)

-
-  <!-- A Robust Example
-
-       This example SearchHandler declaration shows off usage of the
-       SearchHandler with many defaults declared
-
-       Note that multiple instances of the same Request Handler
-       (SearchHandler) can be registered multiple times with different
-       names (and different init parameters)
-    -->
-  <requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse">
-    <lst name="defaults">
-      <str name="echoParams">explicit</str>
-    </lst>
-  </requestHandler>
-
-  <initParams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
+  <initParams path="/update/**,/query,/select,/spell">
     <lst name="defaults">
       <str name="df">_text_</str>
     </lst>
   </initParams>
 
-  <!-- Solr Cell Update Request Handler
-
-       http://wiki.apache.org/solr/ExtractingRequestHandler
-
-    -->
-  <requestHandler name="/update/extract"
-                  startup="lazy"
-                  class="solr.extraction.ExtractingRequestHandler" >
-    <lst name="defaults">
-      <str name="lowernames">true</str>
-      <str name="fmap.meta">ignored_</str>
-      <str name="fmap.content">_text_</str>
-    </lst>
-  </requestHandler>
-
   <!-- Search Components
 
        Search components are registered to SolrCore and used by
@@ -972,30 +977,6 @@
     </arr>
   </requestHandler>
 
-  <!-- Term Vector Component
-
-       http://wiki.apache.org/solr/TermVectorComponent
-    -->
-  <searchComponent name="tvComponent" class="solr.TermVectorComponent"/>
-
-  <!-- A request handler for demonstrating the term vector component
-
-       This is purely as an example.
-
-       In reality you will likely want to add the component to your
-       already specified request handlers.
-    -->
-  <requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
-    <lst name="defaults">
-      <bool name="tv">true</bool>
-    </lst>
-    <arr name="last-components">
-      <str>tvComponent</str>
-    </arr>
-  </requestHandler>
-
-  <!-- Clustering Component. (Omitted here. See the default Solr example for a typical configuration.) -->
-
   <!-- Terms Component
 
        http://wiki.apache.org/solr/TermsComponent
@@ -1016,30 +997,6 @@
     </arr>
   </requestHandler>
 
-
-  <!-- Query Elevation Component
-
-       http://wiki.apache.org/solr/QueryElevationComponent
-
-       a search component that enables you to configure the top
-       results for a given query regardless of the normal lucene
-       scoring.
-    -->
-  <searchComponent name="elevator" class="solr.QueryElevationComponent" >
-    <!-- pick a fieldType to analyze queries -->
-    <str name="queryFieldType">string</str>
-  </searchComponent>
-
-  <!-- A request handler for demonstrating the elevator component -->
-  <requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
-    <lst name="defaults">
-      <str name="echoParams">explicit</str>
-    </lst>
-    <arr name="last-components">
-      <str>elevator</str>
-    </arr>
-  </requestHandler>
-
   <!-- Highlighting Component
 
        http://wiki.apache.org/solr/HighlightingParameters

🚨 THIS IS CRUCIAL FOR US. Newer versions of Solr default to the managed schema factory that @pkiraly suggested in #5989.

@@ -1170,8 +1127,6 @@
 
        See http://wiki.apache.org/solr/GuessingFieldTypes
     -->
-<schemaFactory class="ClassicIndexSchemaFactory"/>
-
   <updateProcessor class="solr.UUIDUpdateProcessorFactory" name="uuid"/>
   <updateProcessor class="solr.RemoveBlankFieldUpdateProcessorFactory" name="remove-blank"/>
   <updateProcessor class="solr.FieldNameMutatingUpdateProcessorFactory" name="field-name-mutating">

These have been changed by upstream and as they seem to use regexes now, should be OK to incorporate.

@@ -1183,28 +1138,16 @@
   <updateProcessor class="solr.ParseDoubleFieldUpdateProcessorFactory" name="parse-double"/>
   <updateProcessor class="solr.ParseDateFieldUpdateProcessorFactory" name="parse-date">
     <arr name="format">
-      <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
-      <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
-      <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
-      <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
-      <str>yyyy-MM-dd'T'HH:mm:ssZ</str>
-      <str>yyyy-MM-dd'T'HH:mm:ss</str>
-      <str>yyyy-MM-dd'T'HH:mmZ</str>
-      <str>yyyy-MM-dd'T'HH:mm</str>
-      <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
-      <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
-      <str>yyyy-MM-dd HH:mm:ss.SSS</str>
-      <str>yyyy-MM-dd HH:mm:ss,SSS</str>
-      <str>yyyy-MM-dd HH:mm:ssZ</str>
-      <str>yyyy-MM-dd HH:mm:ss</str>
-      <str>yyyy-MM-dd HH:mmZ</str>
-      <str>yyyy-MM-dd HH:mm</str>
-      <str>yyyy-MM-dd</str>
+      <str>yyyy-MM-dd['T'[HH:mm[:ss[.SSS]][z</str>
+      <str>yyyy-MM-dd['T'[HH:mm[:ss[,SSS]][z</str>
+      <str>yyyy-MM-dd HH:mm[:ss[.SSS]][z</str>
+      <str>yyyy-MM-dd HH:mm[:ss[,SSS]][z</str>
+      <str>[EEE, ]dd MMM yyyy HH:mm[:ss] z</str>
+      <str>EEEE, dd-MMM-yy HH:mm:ss z</str>
+      <str>EEE MMM ppd HH:mm:ss [z ]yyyy</str>
     </arr>
   </updateProcessor>

Is the removal of this processors still a thing?

-
-  <!--Dataverse removed-->
-<!--  <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
+  <updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
     <lst name="typeMapping">
       <str name="valueClass">java.lang.String</str>
       <str name="fieldType">text_general</str>
@@ -1212,7 +1155,7 @@
         <str name="dest">*_str</str>
         <int name="maxChars">256</int>
       </lst>
-
+      <!-- Use as default mapping instead of defaultFieldType -->
       <bool name="default">true</bool>
     </lst>
     <lst name="typeMapping">
@@ -1232,11 +1175,11 @@
       <str name="valueClass">java.lang.Number</str>
       <str name="fieldType">pdoubles</str>
     </lst>
-    </updateProcessor> -->
+  </updateProcessor>

We should us the setting to disable this instead of changing the default... 🙈

   <!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
-  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
-           processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date">
+  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
+           processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
     <processor class="solr.LogUpdateProcessorFactory"/>
     <processor class="solr.DistributedUpdateProcessorFactory"/>
     <processor class="solr.RunUpdateProcessorFactory"/>
@@ -1265,46 +1208,6 @@
      </updateRequestProcessorChain>
     -->

More upstream due to the libs removed. Looks like we never configured those.

-  <!-- Language identification
-
-       This example update chain identifies the language of the incoming
-       documents using the langid contrib. The detected language is
-       written to field language_s. No field name mapping is done.
-       The fields used for detection are text, title, subject and description,
-       making this example suitable for detecting languages form full-text
-       rich documents injected via ExtractingRequestHandler.
-       See more about langId at http://wiki.apache.org/solr/LanguageDetection
-    -->
-  <!--
-   <updateRequestProcessorChain name="langid">
-     <processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory">
-       <str name="langid.fl">text,title,subject,description</str>
-       <str name="langid.langField">language_s</str>
-       <str name="langid.fallback">en</str>
-     </processor>
-     <processor class="solr.LogUpdateProcessorFactory" />
-     <processor class="solr.RunUpdateProcessorFactory" />
-   </updateRequestProcessorChain>
-  -->
-
-  <!-- Script update processor
-
-    This example hooks in an update processor implemented using JavaScript.
-
-    See more about the script update processor at http://wiki.apache.org/solr/ScriptUpdateProcessor
-  -->
-  <!--
-    <updateRequestProcessorChain name="script">
-      <processor class="solr.StatelessScriptUpdateProcessorFactory">
-        <str name="script">update-script.js</str>
-        <lst name="params">
-          <str name="config_param">example config parameter</str>
-        </lst>
-      </processor>
-      <processor class="solr.RunUpdateProcessorFactory" />
-    </updateRequestProcessorChain>
-  -->
-
   <!-- Response Writers
 
        http://wiki.apache.org/solr/QueryResponseWriter
@@ -1340,23 +1243,6 @@
     <str name="content-type">text/plain; charset=UTF-8</str>
   </queryResponseWriter>
 
-  <!--
-     Custom response writers can be declared as needed...
-    -->
-  <queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy">
-    <str name="template.base.dir">${velocity.template.base.dir:}</str>
-    <str name="solr.resource.loader.enabled">${velocity.solr.resource.loader.enabled:true}</str>
-    <str name="params.resource.loader.enabled">${velocity.params.resource.loader.enabled:false}</str>
-  </queryResponseWriter>
-
-  <!-- XSLT response writer transforms the XML output by any xslt file found
-       in Solr's conf/xslt directory.  Changes to xslt files are checked for
-       every xsltCacheLifetimeSeconds.
-    -->
-  <queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
-    <int name="xsltCacheLifetimeSeconds">5</int>
-  </queryResponseWriter>
-
   <!-- Query Parsers
 
        https://lucene.apache.org/solr/guide/query-syntax-and-parsing.html

Conclusion

Instead of maintaining a static config, we should rely on using the _default configset and apply our changes to it.
At least this is what I'm going to do in the Dataverse Solr container images.

The text was updated successfully, but these errors were encountered:

poikilotherm · 2021-03-08T15:05:06Z

Triggering @qqmyers @mheppler @scolapasta @pdurbin @sekmiller here.

mheppler · 2021-03-08T15:18:34Z

Noted, @poikilotherm. Thank you for catching this, opening an issue and providing all the details. Was already coordinating with @qqmyers and @scolapasta on #7378, and I'll add this new issue to my agenda as well.

Might be worth scheduling another tech hour discussion tomorrow, if there are any questions.

poikilotherm · 2021-03-08T15:21:29Z

I also looked around for upstream changes to schema.xml. There are some changes and maybe we should discuss those, too. (Deprecation of TrieXXXField, some language stuff)

pdurbin · 2021-03-08T16:07:06Z

@poikilotherm first of all, thanks for creating this issue.

Instead of maintaining a static config, we should rely on using the _default configset and apply our changes to it.

This sounds good but I'm not sure how it would work technically. As a starting point, it probably makes sense to list "our changes" so that we're all on the same page. We know we want "boosting" for example (see #1928 (comment) ) but I'm sure there are other tweaks we've made that I'm not thinking of. My guess is that we make fewer than half a dozen changes to the Solr config. Perhaps we should start by listing them in the dev guide so that when we do upgrades developers are aware of them.

…tatic solrconfig.xml IQSS#7662

…Dataverse specific changes IQSS#7662

…7662

…ion IQSS#7662

…guide IQSS#7662

Simple Makefile to download Solr, extract the default configset and create a Dataverse flavored one. - Uses Maven to find the Solr distribution version to download. - Uses xsltproc to apply our XSLT transformations to sorlconfig.xml - Replaces the managed-schema with the static one we provide - Zips the configset to make it distributable as artifact

…tatic solrconfig.xml IQSS#7662

…Dataverse specific changes IQSS#7662

…QSS#7662 Instead of relying on Java provided exceptions, we want to track line numbers and other more details of the parsing process, so we need custom mechanics.

Our custom metadata block TSV files follow a certain order of things. We also do not allow for repetitions or similar. All of this can be most easily be depicted with a state maschine, so we know where to send a line to for parsing. This commit also adds the very basic (empty) POJOs to store the block, fields and vocabularies in to enable testing the state transition. It also adds constants we rely on, like what's the trigger char, the comment intro and the field delimiter

The TSV parser needs to verify if a certain line is a header line and matching the spec. To avoid duplicated validation code, this validator can be used with an arbitrary list of strings (so it can be reused for blocks, fields and vocabularies). As we will need to validate URLs in certain fields, this validator also offers a helper function to create predicates checking for valid URLs.

The Block POJO now contains the header specification (uses the Validator class to perform the validation) and allows to parse a line into a List. A later relaxation of the spec allowing for reordering of fields, etc is possible, while the calling code of the parser can reuse the found header definition. A builder pattern is used to parse and validate the actual definition. As the block may only be used once the definition, all fields and vocabularies have been parsed (if the is an error within the TSV the parsing has to fail!), the builder pattern is a natural match to that.

This simple class will allow to make the parser somewhat configurable, so future changes and command line options can be integrated more easily.

… spec IQSS#7662

Instead of defining a static trigger, we want to be able to configure the trigger sign. Due to this, we use the keyword only and move the trigger handling into the ParsingState (which is analysing the line for state transition anyway).

- Implement first details of the Block POJO - Change parsing with BlockBuilder to use an internal state with a not-exposed Block object - The BlockBuilder may manipulate the Block, but after calling build() the calling code will have no option to edit the POJO (proper capsulation and sealing)

Add field types and make them usable as predicates for fields. Add test.

Predicates are not null safe - need to make validate() check for null

Includes all the predicates according to spec and test for them.

luceneMatchVersion update should be the only real change.

gksachin04 · 2022-05-24T15:20:43Z

After upgrading from SOLR 8.2 to SOLR 8.8.2 LTR response has degraded by 35%. Please find LTR specific details.

QUERY_DOC_FV

Let me know is there anything needs. to added

pdurbin · 2022-05-24T15:52:14Z

@gksachin04 in PR #8415 we already upgraded to Solr 8.11. If you're still having a problem with that version, can you please open a fresh issue? Thanks. And yes, more details would be great. 😄

gksachin04 · 2022-05-24T18:25:31Z

@pdurbin Thanks, do you have LTR specific configuration in Solr 8.11 ?

pdurbin · 2022-05-24T18:41:51Z

@gksachin04 sorry, I don't. You might want to ask the community about it: https://groups.google.com/g/dataverse-community

poikilotherm changed the title ~~Solr 8.8 upgrade - remaining issues~~ Solr 8.8 upgrade - remaining issues with solrconfig.xml Mar 8, 2021

poikilotherm mentioned this issue Mar 8, 2021

Handle non-ascii chars in search #7378

Merged

poikilotherm added Feature: Installer Feature: Performance & Stability Feature: Search/Browse Type: Bug a defect labels Mar 8, 2021

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 21, 2021

refactor(solr): move schema.xml and script to schema folder, delete s…

7b2ac3d

…tatic solrconfig.xml IQSS#7662

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 21, 2021

feat(solr): add XSLT scripts to edit solrconfig.xml with our changes …

d25612e

…Dataverse specific changes IQSS#7662

poikilotherm mentioned this issue Dec 21, 2021

7662 make solrconfig less static #8320

Closed

10 tasks

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 23, 2021

feat(solr): make schema factory XSLT idempotent IQSS#7662

90907a6

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 23, 2021

feat(solr): make search boosting XSLT idempotent IQSS#7662

f1b42e1

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 23, 2021

fix(solr): adapt pathes in shellspec tests for update-fields.sh IQSS#…

a951ee0

…7662

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 23, 2021

feature(container): make pom.xml contain a variable for the Solr vers…

7c890f9

…ion IQSS#7662

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 23, 2021

docs(solr): make Sphinx read Solr version from Maven pom.xml IQSS#7662

d24addc

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 23, 2021

docs(solr): use Sphinx substitution for Solr version in installation …

aacdf6e

…guide IQSS#7662

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Dec 23, 2021

docs(metadata): fix update-fields.sh include path IQSS#7662

dfc1008

This was referenced Dec 23, 2021

solrconfig.xml is using deprecated cache implementation #8325

Closed

upgrade to Solr 8.11.1 gdcc/dataverse-ansible#198

Closed

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jan 3, 2022

fix(solr): make XSLTs output XML proc line & comments IQSS#7662

27c7579

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Feb 3, 2022

fix(solr): make XSLTs output XML proc line & comments IQSS#7662

c385e09

poikilotherm mentioned this issue Feb 3, 2022

Code Infrastructure: create a Maven Parent POM and integrate existing #8394

Closed

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Feb 7, 2022

refactor(solr): move schema.xml and script to schema folder, delete s…

e0b9ec5

…tatic solrconfig.xml IQSS#7662

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Feb 7, 2022

feat(solr): add XSLT scripts to edit solrconfig.xml with our changes …

abd2871

…Dataverse specific changes IQSS#7662

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Feb 7, 2022

feat(solr): make schema factory XSLT idempotent IQSS#7662

7326122

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 29, 2022

feat(solrteur): extend Field model with minimal header spec IQSS#7662

2649a87

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 29, 2022

feat(solrteur): extend ControlledVocabulary model with minimal header…

2aa8a4e

… spec IQSS#7662

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 29, 2022

refactor(solrteur): refactor state factory with state keywords IQSS#7662

3c6a5fa

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 29, 2022

refactor(solrteur): let Validator use the configuration IQSS#7662

aacb890

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 29, 2022

feat(solrteur): add types enum to Field IQSS#7662

03d19e0

Add field types and make them usable as predicates for fields. Add test.

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 29, 2022

fix(solrteur): make headers not match null IQSS#7662

13de6d5

Predicates are not null safe - need to make validate() check for null

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 29, 2022

feat(solrteur): add all headers to Field IQSS#7662

52dc5a9

Includes all the predicates according to spec and test for them.

poikilotherm mentioned this issue May 4, 2022

Update metadatablock logic to fail if dataverse collection does not exist #8675

Open

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 4, 2022

updating solr config towards IQSS#7662

ef8af3b

luceneMatchVersion update should be the only real change.

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 5, 2022

updating solr config towards IQSS#7662

9032db2

luceneMatchVersion update should be the only real change.

qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this issue May 5, 2022

updating solr config towards IQSS#7662

e4ed7e2

luceneMatchVersion update should be the only real change.

poikilotherm mentioned this issue Nov 30, 2022

Feature Request/Idea: Nested compound fields #9200

Open

GPortas mentioned this issue Mar 3, 2023

Dataverse container-based development environment #9423

Closed

This was referenced Feb 7, 2023

Upgrade to Payara 6 and from EE 8 to EE 10 (end of Payara 5 Community Edition security patches Q2/2022) #8305

Closed

Test and move from Solr 8.X to Solr 9.X #9260

Closed

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Feb 16, 2023

feat(solrteur): move classes and extend field parser IQSS#7662

bc06537

GPortas mentioned this issue Apr 12, 2023

Solr custom Docker image for dev #9516

Closed

dmundra mentioned this issue Jun 6, 2023

4.x solr9 mkalkbrenner/search_api_solr#79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solr 8.8 upgrade - remaining issues with solrconfig.xml #7662

Solr 8.8 upgrade - remaining issues with solrconfig.xml #7662

poikilotherm commented Mar 8, 2021 •

edited

Loading

poikilotherm commented Mar 8, 2021

mheppler commented Mar 8, 2021

poikilotherm commented Mar 8, 2021 •

edited

Loading

pdurbin commented Mar 8, 2021

gksachin04 commented May 24, 2022 •

edited

Loading

pdurbin commented May 24, 2022

gksachin04 commented May 24, 2022

pdurbin commented May 24, 2022

Solr 8.8 upgrade - remaining issues with solrconfig.xml #7662

Solr 8.8 upgrade - remaining issues with solrconfig.xml #7662

Comments

poikilotherm commented Mar 8, 2021 • edited Loading

Mistake

Other changes

Conclusion

poikilotherm commented Mar 8, 2021

mheppler commented Mar 8, 2021

poikilotherm commented Mar 8, 2021 • edited Loading

pdurbin commented Mar 8, 2021

gksachin04 commented May 24, 2022 • edited Loading

pdurbin commented May 24, 2022

gksachin04 commented May 24, 2022

pdurbin commented May 24, 2022

poikilotherm commented Mar 8, 2021 •

edited

Loading

poikilotherm commented Mar 8, 2021 •

edited

Loading

gksachin04 commented May 24, 2022 •

edited

Loading