pipeline breaks with longer html #93

fsasaki · 2016-08-26T14:46:52Z

See
http://api.freme-project.eu/current/pipelining/chain/37
and the two attached requests. The only difference between the is the length of the files processed. The longer file is less than 100K - is this still an issue?

request-with-short-html-doc.txt
request-with-long-html-doc.txt

jnehring · 2016-08-29T15:17:31Z

The first pipeline step detects 1052 named entities, the second creates 1052 sparql queries and sends them to dbpedia. This takes a long time. There is a timeout of 60 seconds configured.

I transfered the pipeline to freme-dev and changed the timeouts to 600 seconds in three places on freme-dev

apache mod_proxy
timeout of the rest controller in application.properties
timeout of requests in pipelines in the source code of PipelineService.java

Now the requests fail after 10 minutes. I am not sure how to deal with this. These timeouts make sense, but on the other hand it should still be possible to process large files.

The problematic service here is e-Link with the slow dbpedia endpoint. A client side solution that I did not explore yet is to download the entities via freme-ner and then send them in smaller batches to e-Link. A server side solution would be to set timeouts to 1 hour, or to load the dbpedia in our own triple store and hope that this improves response times.

jnehring · 2016-09-05T08:09:59Z

In last developers call @m1ci said he will check if the implementation of e-Link can be speed up somehow. Possibilities to explore from what I recall from the discussion:

reduce the number of sparql queries, by fetching information about multiple links in one go
implement caching / avoid redundant calls about the same link

jnehring · 2016-09-28T08:49:09Z

any update here?

jnehring · 2016-12-05T14:54:29Z

Pipeline 37 does not exist anymore. But one can reproduce the problem using this curl request.

The problem still occurs.

m1ci · 2016-12-06T10:56:20Z

just did an optimization update at e-link to perform enrichment only on unique entities. In other words, if there are multiple occurrences of a same entity, the enrichment will be performed only once. @jnehring can you please test now?

jnehring · 2016-12-07T13:34:45Z

I had issues executing the curl request I posted earlier. Therefore I created the pipeline id 56 on freme-dev that executes freme ner first and then e-link.

It still fails on the long document

curl -X POST -H "Content-Type: text/html" -d '@long-document.html' "http://api-dev.freme-project.eu/current/pipelining/chain/56"

@m1ci could you process the long document succesfuly?

Your update makes sense, even if it cannot process the long document. Since the http requests time out after a while there has to be a maximum length of text / maximum number of entities that the service can process.

fsasaki added the question label Aug 26, 2016

fsasaki assigned jnehring Aug 26, 2016

jnehring added a commit that referenced this issue Aug 29, 2016

#93 increase pipeline timeouts

2486efc

jnehring assigned fsasaki and m1ci and unassigned fsasaki Aug 29, 2016

jnehring removed their assignment Sep 14, 2016

jnehring added bug and removed question labels Dec 5, 2016

m1ci added a commit to freme-project/e-services that referenced this issue Dec 6, 2016

fix for freme-project/basic-services#93

7fdc85c

jnehring self-assigned this Dec 6, 2016

jnehring unassigned m1ci and jnehring Jan 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline breaks with longer html #93

pipeline breaks with longer html #93

fsasaki commented Aug 26, 2016

jnehring commented Aug 29, 2016

jnehring commented Sep 5, 2016

jnehring commented Sep 28, 2016

jnehring commented Dec 5, 2016

m1ci commented Dec 6, 2016

jnehring commented Dec 7, 2016

pipeline breaks with longer html #93

pipeline breaks with longer html #93

Comments

fsasaki commented Aug 26, 2016

jnehring commented Aug 29, 2016

jnehring commented Sep 5, 2016

jnehring commented Sep 28, 2016

jnehring commented Dec 5, 2016

m1ci commented Dec 6, 2016

jnehring commented Dec 7, 2016