High cpu usage #49

diadistis · 2015-08-10T23:33:59Z

Setup

Latest stream2es (20150720170522978252e) on server (6 cores / 64GB ram) separate from the es cluster
A big (~65GB) file containing 1 large json object per line. There are about 15 million lines/documents and the average line size is ~4.3k characters

Problem

I'm running :

cat bigfile | stream2es stdin --target http://server:9200/index/type --log debug -w 12

I have tried several different options for --bulk-bytes, -w, -d and -q but always the same result. I'm getting a constant indexing speed of ~5MB/s which translates to 4 hours to import the file. While indexing the elasticsearch cluster is heavily under-utilized and the stream2es server has a single core at 100%. I have done extensive testing to ensure that there are no network or elasticsearch performance issues.

Workaround

My final solution was to run stream2es in parallel (not with -w) to see if that would help.

cat bigfile | parallel -j12 -L5000 --pipe "stream2es stdin --target http://server:9200/index/type"

That helped a lot. Now all 6 cores and 12 threads get 100% and the indexing time fell from 4 hours to 35 minutes but the elasticsearch cluster is still pretty much idle. It seems to me that something in stream2es uses way more cpu than it should.

The text was updated successfully, but these errors were encountered:

drewr · 2016-05-09T20:43:46Z

Thanks for reporting this @diadistis, and sorry for the terrible response time. I've noticed similar, and I've done similar workarounds. I haven't had a chance to do profiling on the internal design to isolate the bottleneck, but I suspect at the very least the single LinkedBlockingQueue that feeds the pipeline is part of it.

I did just push a fix for some extraneous string copying, but it won't speed anything up 8x. If you still have this environment available I'd love to know its effect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High cpu usage #49

High cpu usage #49

diadistis commented Aug 10, 2015

drewr commented May 9, 2016

High cpu usage #49

High cpu usage #49

Comments

diadistis commented Aug 10, 2015

Setup

Problem

Workaround

drewr commented May 9, 2016