-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
search_api_solr support for EDTF #962
Comments
@dannylamb, @seth-shaw-unlv If no one else is working on this feel free to assign it to me and I can take a look at it. |
@ppound, funny you should say that. I just started working on it this morning. The search_api processors are new to me, so we'll see how it goes. I'll ping you if I get stuck. |
So, it looks like my attempt simply use widgets and formatters on a text field is coming back to bite me. The Drupal Search API wants to parse the values as dates before we even get to the Processor plugins but DateTimePlus isn't a fan of our string value and throws an error before we can do anything about it. It looks like we will need to create an actual FieldType to make this work... |
The index field's datatype setting is what sets the SOLR schema, so if we want SOLR to view a field as a date, we need to declare the datatype as such there. However, the Search API FieldsHelper will pull the field value and try to parse fields with the date data type using date_parse. To get around this behavior we need to extend the SOLR DateRangeDataType and override getValue so that we can transform it to an ISO friendly format first. |
Nevermind, providing a new DataType doesn't work either, because search_api_solr has a hard-coded list of data types it supports. So if we want this to work we need to either extend or mimic the Datetime Range FieldType. |
Ok I poked around at this a bit as well before I saw your message. It sounds like we went down the same paths. There is some code here ppound/controlled_access_terms@31b2f16 that will index the fields into solr as daterange fields (after enabling the processor and setting the fields to the correct type in the search-api config) searching within solr works but I haven't tried searching from within Drupal yet (which is probably where I'll get stuck too). |
@ppound I tried your branch and it still won't index in SOLR (6.6.5) for me. Also, the Search API Data Types will only use the String fallback. I think we really will need to revamp EDTF to get it to work. |
This doesn't seem to be documented anywhere, so I'm making a note here: using the Date Range data type requires SOLR 7.x. If you select the Date Range type with SOLR 5.x or 6.x it will silently fail to index the field; you have to use the Date data type and index end_value as a separate date field. |
I have a new EDTF FieldType that repurposes the existing widget and formatter. The search api seems to work as single values are successfully indexed in Solr 7.x as date ranges! Multi-values don't work yet nor have I attempted the JSON-LD pieces. Also, don't enable the controlled_access_terms_default_configuration as I haven't updated those configs to use the new field yet. (Also, there is plenty of code cleanup that could be done.) |
Bah, I'm walking away from this. 😒 I've gotten SOLR to take the date ranges but not as single dates. Also, it doesn't appear that the search API wants to query them anyway; the facets module barely supports datetime and doesn't support datetime_range at all. You probably could get it to work by writing several custom plugins, but it doesn't seem worth it just to get a nice slider facet. It looks like string-based EDTF, as suggested during the recent call, is the best way to go. It indexes just fine: I think we may need to stick with that, for now. |
Note: if you want to spin up what I have so far:
That should spin you up a fresh instance with all the various EDTF fields now set to EDTF FieldType instead of string. |
My concern about indexing EDTF dates (and a number of other fields currently set up as strings by default) as strings in Solr is that Solr string data type does not permit partial match. Thus, in your screenshot above, if you searched for Likewise, a search for That's great for when you click on the facet value, but not so great if you let users type in a search. They will always be getting artificially small search sets. Have we done something under the hood to help search work as expected on string fields? (I don't remember details, but on another project I worked on, I think we ended up defining a "string-like" Solr field type that didn't get any of the language-processing (stemming, etc) treatment but got whatever basic edge/ngram processing was necessary to make exact-but-partial string match work) I ask because I believe I'm looking at an out-of-the-box Islandora install that has some custom field types like |
I admit to being a SOLR novice. I haven't played with any of the other Fulltext variants to see how they impact search results (yet, it is on the list). I should also note, while I'm at it, that there has been a number of conversations on this topic, mostly on Slack, since I last made an update in late 2018. The current thinking is that a Search API processor is the best way forward, instead of trying to extend the DateTime fieldtype. The most progress has been made by @joecorall and @elizoller who have implemented year-based date facets (omitting months, days, etc.) by using a field processor to index the year of an ETDF date. |
FWIW, here's the processor being used for the EDTF year facet on Open Access Kent State: https://gist.github.com/joecorall/fa914809af3304cdd98194d929d1bad9 |
meta-issue: #1748 |
Add initial EDTFYear processor part of Islandora/documentation#962
Instead of leaving my EDTF as a FieldType branch lying around cluttering things up I decided to simply make a patch file and post it here in case anyone wants to come back and reference it. |
Currently EDTF field values do not fit the SOLR syntax for DateField or DateRangeField. (See SOLR "Working with Dates".)
E.g. EDTF uses a "/" to separate the beginning and end of a date range whereas SOLR wraps ranges in square brackets and uses " TO " as a separator. This would mean converting
2000/2018
to[2000 TO 2018]
.We can write a Drupal Search API index preprocessor to do the conversion. The simplest example processor to follow as a guide is probably IgnoreCase.
The text was updated successfully, but these errors were encountered: