-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Solr fields via an API call (investigation) #5989
Comments
@pkiraly thanks for opening this issue! Creating fields in Solr programmatically via API would be a huge improvement over what we do know, which is to manually update schema.xml from time to time. As you say, it would be especially useful when custom metadata blocks are created. The documented procedure at http://guides.dataverse.org/en/4.15/admin/metadatacustomization.html#updating-the-solr-schema (screenshot below) if quite manual. Please let me know if I can help at all. Thanks again. |
Hi @pdurbin, Yes, it's on the Roadmap of SSHOC DataverseEU project and we'll try to find a solution together. But probably someone already knows "how to unlock the closed door". |
It would be great help to figure it out if it generally doesn't do what is described in Solr manual or it is just for me (due to some confounding factors of my system configuration). To do the experiment, do the following steps (on a test machine).
and replace to this:
|
@pkiraly one observation is that even before I do anything there's a
I made a copy of it with this:
I'm confirming if Solr is up or down with these:
Stopping Solr with this:
Backing up the file before editing:
Start Solr again:
A file named
No change to /usr/local/solr/server/solr/collection1/conf/managed-schema . This diff shows no changes:
|
I asked @erikhatcher a well known Lucene/Solr contributor, author and speaker. He suggested that if this issue occurs, run the following procedure:
I tried it, and it works. |
@joelmarkanderson would benefit from a solution. He recently reported the following at https://groups.google.com/d/msg/dataverse-community/lr26VTP8lhs/5JoZ-IdnBQAJ "I have successfully populated a controlled vocabulary metadata block, and the list of 38 Values correctly shows under the "Add + Edit Metadata" configuration screen. However, selecting and saving a tag results in an webpage error message: "Error – The metadata could not be updated. If you believe this is an error, please contact Support for assistance." ... Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/collection1: ERROR: [doc=dataset_758_draft] unknown field 'tag' " @pkiraly I have not yet tried your latest suggestion above. Mostly I'm just posting the error above so people can find this issue in the future. |
@pkiraly heads up that as a stop gap measure @poikilotherm and I have cooked up a new plan in a new issue: Make Solr schema.xml configuration more flexible, still using Classic Schema Factory #6142 You (and others) are also welcome to read our conversation about it at http://irclog.iq.harvard.edu/dataverse/2019-09-05#i_104531 To be clear, Oliver and I and others still want the solution you are proposing in this issue. We just think the proposal in the other issue will be less effort. It's a short term solution. Your idea (this issue) is the longer term solution. 😄 |
Is there anybody who could reproduce the process I suggested (see my comments #5989 (comment) and before that)? @pdurbin Do you have some label for "help needed"? I do not have right to add labels. |
@pkiraly I have not tried playing with Managed Schema. I can add some "help wanted" labels. Maybe @poikilotherm can help? I believe that the future pull request will be a doc change and some scripts that add fields to the Solr schema dynamically based on the fields metadata blocks that have been loaded into Dataverse. |
By the way, if anyone wants a real custom metadata block to play with, a new one called "codemeta.tsv" is attached to the "CodeMeta-Metadata for Software and displayFormat for controlledVocabularies" thread at https://groups.google.com/d/msg/dataverse-community/nDMbMv4fKf4/P5YxHJzDBgAJ |
It remains unclear why this data transfer object, used for file asset facets only so far has ever been treated as a bean. The facet labels are used to render facet query links within the Web UI and retrieved from a `Map` in `DatasetPage` UI backing bean.
…eldType IQSS#5989 Moving: while living inside the search package, the functionality is much extended to be used for validation and schema definition. The schema related stuff needs to live on its own. Renaming: make the class more representative of Solr terminology
The SolrFieldProperty class is used as a POJO to wrap Boolean or String properties of Solr <field>, <fieldType> and <dynamicField> definitions.
This is a base class to depict <field>, <dynamicField> and <fieldType> in implementing subclasses.
Introducing SolrFieldType to be kind of an enum class of available types. As Java enums do not allow extending a base class, using the good old pre Java 5 style here. Adding a field "ALL" to make all types available as `List` via reflection.
It remains unclear why this data transfer object, used for file asset facets only so far has ever been treated as a bean. The facet labels are used to render facet query links within the Web UI and retrieved from a `Map` in `DatasetPage` UI backing bean.
…eldType IQSS#5989 Moving: while living inside the search package, the functionality is much extended to be used for validation and schema definition. The schema related stuff needs to live on its own. Renaming: make the class more representative of Solr terminology
…ype IQSS#5989 This commit introduces a few changes: - The type enum in DatasetFieldType is enhanced to map to SolrFieldType types of Solr fields. - The retrieval of dataverse.search.SolrField from a DatasetFieldType is much simplified due to this usage of SolrFieldType in both areas of the code. - As there might be no mapping existant for some types (like email), the DatasetFieldType.getSolrField() has been refactored to return an Optional<SolrField>. All usages of the method have been updated aligned to this change.
This EJB stores the Solr schema we need to follow as a single source of truth. We will rely on this in-memory model to validate, update and manage the schema inside the backing Solr instance. It will be usable via API to reload the schema model from the database (and code) plus will do so automatically during startup of Dataverse. This is necessary to have empty Solr instances bootstrapped by us. Thanks to @pkiraly @pdurbin and @rtreacy most of this code was done during a "hacky friday" code-with-me session.
…d present in the default schema) IQSS#5989
…olrJ field Map<String,Object> IQSS#5989
…elds from Solr IQSS#5989 Some dynamic field definition in Solr are present by default, but completely unused by us. We ignore those, as we don't want to depict all the types they require in SolrFieldType.
This adds a minimal prototype to work with the schema retrieved from Solr. Does not yet do any comparison, but shows good results in converting the schema present in Solr into the comparable in-memory model.
This adds a minimal prototype to work with the schema retrieved from Solr. Does not yet do any comparison, but shows good results in converting the schema present in Solr into the comparable in-memory model.
Hmm. I see some commits from a little over a year ago from @poikilotherm above. I think it's safe to say that at least @pkiraly @poikilotherm and I (and probably others) are still interested in this but no one has had the time to code it up. |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
When it comes to adding custom metadata block, there is a manual step involved: adding Solr fields to schema.xml, however Solr provides an API (Schema API) to make this manual step unnecessary, however it invloves some changes in Dataverse.
Note: this ticket is about investigation instead on implementation. First we have to understand every aspect of this change to make sure the existing technologies are reliable and fully support the request.
Solr has a Schema API, which lets you to modify the Solr schema (the list of fields and their properties). Solr can handle the schema in two different ways, and it can be controlled in the solrconfig.xml file. There is a "classic" way, which is based on schema.xml file, and a newer way, called managed schema (its materialization is the "managed-schema" file, and it is editable via the Solr user interface or via API, but it is not advised to edit this file manually).
In the Dataverse provided solrconfig.xml you have this:
The schema API doesn't work with the ClassicIndexSchemaFactory. If you try, Solr returns an error message: "schema is not editable". To enable Schema API, we have to change this setting:
Set ManagedIndexSchemaFactory in solrconfig.xml:
After this you have to restart Solr, and the Schema API will work this way:
The details of the Schema API can be found here:
https://lucene.apache.org/solr/guide/7_3/schema-api.html
The details of change from classic schema:
https://lucene.apache.org/solr/guide/7_3/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-Switchingfromschema.xmltoManagedSchema
The problems:
The documentation says: "Once Solr is restarted and it detects that a schema.xml file exists, but the managedSchemaResourceName file (i.e., “managed-schema”) does not exist, the existing schema.xml file will be renamed to schema.xml.bak and the contents are re-written to the managed schema file." When I tried it, the schema.xml were not copied, and not renamed. However since the same searches, even fielded searches are working.
When I use Schema API to retrieve fields, it contains only the default Solr fields, and not those Dataverse added via schema.xml.
I asked help from a Solr expert.
(I added @4tikhonov as watcher)
The text was updated successfully, but these errors were encountered: