-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing of type TASK is extremely slow #5131
Comments
Is the application running behind an application proxy like Apache HTTP or nginx? If so did the application proxy forward the web sockets to the running Tomcat instance? The indexing page is using a websocket to send the current indexing process to the client browser who starting the the indexing. If the web socket request and data is not forwarded then you see only a indexing process if you manually refresh the indexing page. Just some hints from a larger installation, which can maybe speed up or speed down the indexing (some need a issue to open):
|
This is the common setup with Apache/modJK. Yes, I did index row by row, and the issues only showed for indexing Tasks, any other was fast, even Processes only took short. (At least, I didn’t notice it being slow; but I can test it again the days.)
I don’t know if this is true any more, I didn’t observe these requests in HTTP traffic, only roughly one request per second, see the animation above. It looks like this is checking if the index has been created. (Could also be cached, BTW.) |
Apache with mod_jk for forwarding the requests to Tomcat did work well until web sockets are used. Mod_JK can not forward web sockets and it's looks for me that the development for mod_jk is paused. I had even issues with mod_jk if you use HTTP2 so I switched to mod_proxy and its sub-projects for forwarding all requests to Tomcat.
I see the amount of request to Elastic Search only as I used a similar construct in front of Elastic Search like on our application server. Maybe you did not see this messages any more as you maybe refreshed the index page in front of this. Or it depends on your Apache/mod_jk setting which is "blocking" the web sockets. |
The incomplete refresh of the indexing page is definitely also something we should look into, but the primary concern in this issue is that the indexing tasks are so slow, like: Why is only one indexing request sent per minute for tasks? Who is waiting there for what, and why? |
On our system or in the logs of our system I see only bulk requests which contain the content and amount of the to be indexed data (in our case 2.500 entries per bulk request). It needs time to retrieve the data from the database (could be faster but okay), need a lot of time to transfer the data from the database object to the json object (which needs indeed a speed improvement if this is possible) and a short time to send the data to the Elastic Search instance (this done in an asynchron request). My hope is still that we can reach a better performance on indexing by switching to hibernate-search but I did not hear something in the last couple of month how far this project is as this project is founded by the development found if I know this right. |
On a linux server, indexing of tasks roughly takes 1 minute for every 280 tasks (1h 47' for 29.930 tasks). Meanwhile, the system is idle to 60%.
Though log level is set to TRACE for
org.kitodo
, there is no information in the log file. Monitoring the HTTP trafficthere is roughly one indexing request per minute. Looks like there is a timeout playing in, which shouldn’t be there …
Indexing was started with button Start indexing for all on Tasks row.
Additional observations: Number of indexed entries and progress bar for “Whole index” didn’t update, and indexing buttons didn’t turn back blue after indexing of tasks was complete:
Goal: Speed up indexing of TASK.
The text was updated successfully, but these errors were encountered: