-
Notifications
You must be signed in to change notification settings - Fork 64
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LuceneSearchService - sort order of German 'Umlaute' #426
Comments
This is an interesting question which I can't answer for now. "lucence.analyzerClass" which defaults to the org.apache.lucene.analysis.standard.ClassicAnalyzer You can overwrite this property with the imixs.properties file. For my first understanding if you want to use the GermanNormalizationFilter you need to implement your own Analyzer. For example:
Than you can activate the custom analyzer in your project with a imixs.properties entry :
But I am not sure if this is an workable solution. Can you check this? |
I tried it out but can't found the mentioned imported classes: In my pom I have:
Where can I found these classes? |
The lucene dependencies needed are:
|
look if you can build your project with mvn install. It looks like you have project setup problem in eclipse |
So it's a eclipse problem. You can try the project command "maven -> update project". |
Okay I tried it with the compile errors and added the line to the imixs.properties file but the sort order of the german "Umlaute" hasn't changed. There were no other errors. Now I added a line in the class with an output in the console but it never appears, so I think the class has never been called. |
Yes you are right. This is indeed a bug of the LuceneUpdateService. I opened a new issue #429 |
I fixed this now in release 4.4.1-SNAPSHOT. Please try if this works for you. |
Yes, the class will be called in 4.4.1-SNAPSHOT.
Okay, now I added the jars manually to the build path. |
concerning the compilation problems: you need to check your project and IDE setup I think. concerning the sort order problem: how is the call of the find method looking now? How looks your imixs.properties 'lucence.indexFieldListNoAnalyze'. This is a important setting for sorting. Is your sorting field listed there? |
Okay, I will set it up again. The call of the find method: |
Ok - do you know the lucene tool 'luke' This is a very cool application which allows you to test your lucene index with different settings. Maybe we can figure out if the index is correctly written. |
I tested the sort order with 'luke' and the analyzer org.apache.lucene.analysis.de.GermanAnalyzer. |
But I understand the 'Ü' should be replaced with 'Ue'. So for my understanding the fields should not have an value with 'Ü'. The GermanAnalyzer should replace the tokens. Can you check this with Luke? If you know the unqiueid you can lookup the lucene documetn in Luke with all its itmes. |
yes of course that's right, the fields are not stored in lucene - so we can not look into that detail..... |
sorting by item names starting wit "$" works . For example in the admin client you can verify this by sorting the result by '$created' |
Okay, sorting will work if all letters of the key are in lower case. |
hm... this all seems to be not so easy... Maybe we should take more focus on the lines 242-248 in LuceneSearchService.search().
Maybe the TopFieldCollector need to be applied with the correct filter class..... If I remember correctly, search and sorting in Lucne are two separate processes. If so, it would not make sense to do this filtering when the index is written.... |
Have I to add the analyzer to imixs-admin as well? There I create the lucene-index. Do I really need an analyzer? Or how can I use a replacement of the umlauts like this?
|
I think this is right because the search doesn't work using MyCustomAnalyzer. Sorting numbers is in my opinion also wrong. It looks like this: |
I read about the TopFieldCollectior and it does not look like this class is responsible for the search order. So I was on the wrong path... I am not sure if lucene is able to sort numbers. Did you have asked that question in the lucene forum already? |
Now I got an answer to this topic: Is it possible to use this with imixs workflow? Thank you. |
Concerning the sorting by number, I think the problem for now is the LucenUpdateService: Line 684 in 22dce06
Here we create the index based on a SortedDocValuesField. Maybe we can use in some cases a SortedNumericDocValuesField. We can try here the following in this case:
We must see if this works as expected |
I think I have now found a working solution for this problem. I added a new CDI bean called LuceneItemAdapter. This bean does the converting of Item values and also the creation of SortedDocValuesFields. And now with this solution your application can simply replace this adapter by an CDI alternative. And hopefully we can integrate your bean later back into the Imixs-Workflow project.
and add your bean into the beans.xml of your application:
|
Oh great. Is there a 4.5.0-SNAPSHOT with the LuceneItemApapter?
As far as I understood the implementation, it should work like this?
I tried sorting the $taskid which is really represented by a String?! |
I deployed the snapshot now. Lets concentrate on the GERMN Umlaute problem. I think your need only to overwrite the method adaptSortableItemValue
|
I tried it out with the following method but the change order hasn't changed, even after rebuilding the index.
The logger message doesn't appear in the console. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hi,
is it possible in the implementation of imixs-workflow to tell lucene that it should order the german Umlaute correctly? Like 'Ü' should be 'Ue' and so on...
I think there are possibilities for lucene but I don't know how to implement it in imixs-workflow:
https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/charfilter/MappingCharFilterFactory.html
Thank you!
The text was updated successfully, but these errors were encountered: