-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeline breaks with longer html #93
Comments
The first pipeline step detects 1052 named entities, the second creates 1052 sparql queries and sends them to dbpedia. This takes a long time. There is a timeout of 60 seconds configured. I transfered the pipeline to freme-dev and changed the timeouts to 600 seconds in three places on freme-dev
Now the requests fail after 10 minutes. I am not sure how to deal with this. These timeouts make sense, but on the other hand it should still be possible to process large files. The problematic service here is e-Link with the slow dbpedia endpoint. A client side solution that I did not explore yet is to download the entities via freme-ner and then send them in smaller batches to e-Link. A server side solution would be to set timeouts to 1 hour, or to load the dbpedia in our own triple store and hope that this improves response times. |
In last developers call @m1ci said he will check if the implementation of e-Link can be speed up somehow. Possibilities to explore from what I recall from the discussion:
|
any update here? |
Pipeline 37 does not exist anymore. But one can reproduce the problem using this curl request. The problem still occurs. |
just did an optimization update at e-link to perform enrichment only on unique entities. In other words, if there are multiple occurrences of a same entity, the enrichment will be performed only once. @jnehring can you please test now? |
I had issues executing the curl request I posted earlier. Therefore I created the pipeline id 56 on freme-dev that executes freme ner first and then e-link. It still fails on the long document curl -X POST -H "Content-Type: text/html" -d '@long-document.html' "http://api-dev.freme-project.eu/current/pipelining/chain/56" @m1ci could you process the long document succesfuly? Your update makes sense, even if it cannot process the long document. Since the http requests time out after a while there has to be a maximum length of text / maximum number of entities that the service can process. |
See
http://api.freme-project.eu/current/pipelining/chain/37
and the two attached requests. The only difference between the is the length of the files processed. The longer file is less than 100K - is this still an issue?
request-with-short-html-doc.txt
request-with-long-html-doc.txt
The text was updated successfully, but these errors were encountered: