Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] autosuggest by words containing hyphens and other special chars #4170

Open
dkoether opened this issue Sep 25, 2024 · 1 comment
Open

Comments

@dkoether
Copy link
Contributor

Describe the bug
When using special characters like hyphens in the search term field no autosuggest option is provided anymore.

To Reproduce
I have several pages with several tags like "Digital-Abo" or "Digital-Ausgabe" in a Solr index.

Steps to reproduce the behavior:

  1. I start to input my search word with the first two letters "Di"
  2. I see several autosuggest options like 'Digital', 'Digital-Ausgabe', 'Digital-Abo".
  3. I enter more characters and also a hyphen "Digital-" and not a single autosuggest option is provided anymore.

Terms with whitespaces are working as expected.

Expected behavior
Special characters should also be working in autosuggest feature by default.

Used versions (please complete the following information):

  • TYPO3 Version: 11.5.39
  • Browser: Chrome
  • EXT:solr Version: 11.5.0
  • Used Apache Solr Version: 8.11
  • PHP Version: 8.2

By default the autosuggest feature uses the field spell. I tried different field types like stringM, textM or textSpellM as there are multiple values but $results->facet_counts->facet_fields->{$suggestConfig['suggestField']} is always an empty array.

Thank you in advance!
Best regards!

@dkd-kaehm
Copy link
Collaborator

dkd-kaehm commented Sep 25, 2024

EXT:solrs suggest-service uses "facet"-approach.
The spell and other text-fields using <tokenizer class="solr.StandardTokenizerFactory"/> do not fit that case.
See: https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#standard-tokenizer

There is a field wanted which combines following on index analyzer-config:

  1. <tokenizer class="solr.ClassicTokenizerFactory"/>
    See: https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#classic-tokenizer
    1.1. (Optional) Synonyms + Stop filter. If applied, then field is language dependent.
  2. <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="32" preserveOriginal="true"/>
    See: https://solr.apache.org/guide/solr/latest/indexing-guide/tokenizers.html#edge-n-gram-tokenizer

http://solr-site:8983/solr/core_en/select?omitHeader=true&facet=on&facet.prefix=pre-&facet.field=spell&facet.limit=10&facet.mincount=1&facet.method=enum&wt=json&json.nl=flat&q=&start=0&rows=10&fl=spell&fq=siteHash:"2ddf3ad239669e6c3e3110228186b1a92f9648a8"&fq={!typo3access}-1,0&defType=edismax&q.alt=:


Please add that field via pull-request.
I'll change the tracker from bug to feature.

@dkd-kaehm dkd-kaehm changed the title [BUG] Search words with special characters like hyphens break autosuggest feature [FEATURE] autosuggest by words containing hyphens and other special chars Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants