-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing is failing on many structure and meta data element #4724
Comments
@henning-gerhardt @Kathrin-Huber This parameter "index.mapping.nested_objects.limit" seems to have been added as of version ElasticSearch 7.0. The question is whether the default is too low for our purposes? Depending on the server resources available, this could also be increased. Even with a value of for example 30000, this protects against memory errors in a powerful environment. It is not without reason that this is a parameter! ;) Nevertheless, we have to look at which ends we can optimize the source code here in order to avoid memory errors. As a quick solution, I would recommend adjusting the parameter if there are enough resources. If there are known optimizations, we should create an issue to improve indexing. |
From the ElasticSearch documentation on the nested field type:
So, if we don’t use querying the objects independently of each other, we don’t need a As—to my knowledge—we do not use such information in our context, this isn’t necessary at the moment. However, changing the |
@markusweigelt So far as I understand this parameter, this parameter influence the behavior on the server side and not on the client side. I would suggest to make this parameter configurable through the |
As I understand the parameter is in the ElasticSearch configuration file. If so, Production could just call a |
It's hard to say in general, but as you can see, a separate index entry is created for each structure element × each metadata entry, for the example document over 10,000 index records are created, which is why the error occurs. I think it is possible to increase the parameter a bit now, but it indicates an improper implementation of the search engine usage. |
I think there are many adjusting screws (RAM, disk space, entering data volume) here that have an influence on behavior. If we want to know exactly, we would have to use ElasticSearch in conjunction with Kibana or Grafana etc. If that is possible in the free version of ElasticSearch. Then we can change the parameter and monitor the influence. I think that the parameter is based on the minimum requirement of ElasticSearch. If we theoretically put these in relation to our available resources, we could change them up to this maximum. I cannot currently find out why this parameter has a default value of 10000 and how this value was determined. It may also be too low in general. |
Setting the parameter "index.mapping.nested_objects.limit" to 30000 through like
solved temporarly the issue until the ElasticSearch index get destroyed. Setting this value must be done after creating the mapping inside ElasticSearch but before you start the indexing of processes or you must redone everything again. Setting this parameter should be done inside the application instead of running a curl command on the ElasticSearch server. |
Indexing a process with a lot of structure (> 450) and meta data elements (> 2960) fails with
The mentioned process is already available under https://digital.slub-dresden.de/id1685679609 as this issue is happened on re-indexing the process data.
The text was updated successfully, but these errors were encountered: