Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce default max in flight requests to 5 #108

Merged
merged 3 commits into from
May 13, 2019
Merged

Conversation

orangejulius
Copy link
Member

@orangejulius orangejulius commented May 12, 2019

This package has historically been very aggressive regarding how many requests it will allow to be in flight to Elasticsearch.

We lowered the maximum number of in-flight requests to 10 recently (see #76), but I think this is still too high. Recently we have seen some Elasticsearch timeouts when running highly parallel imports.

My suspicion is that it's very unlikely a high number of in-flight bulk index requests is the best way to ensure high performance. For geocode.earth, we run planet builds on a 36 core machine, with a total of 6 importer processes running at once at the start (2 OA, OSM, polylines, geonames, WOF).

Since the bulk import endpoint already allows importing many records in parallel (500 by default in this package), 6 importers could lead to up to 60 bulk requests in flight at once, totaling 3000 records. My guess is even 2-3 bulk requests is enough to keep Elasticsearch busy, so the default config is doing nothing but filling up the Elasticsearch bulk threadpool and queue and bringing the cluster closer to tipping over the edge of too much load and having to drop requests.

Eventually I'd like to allow us to configure this option easily across all importers, but for now lets test this value.

Connects #76
Connects #83

This package has historically been very aggressive regarding how many
requests it will allow to be in flight to Elasticsearch.

We lowered the maximum number of in-flight requests to 10 recently
(see #76), but I think this is still too high. Recently we have seen
some Elasticsearch timeouts when running highly parallel imports.

My suspicion is that it's very unlikely a high number of in-flight bulk
index requests is the best way to ensure high performance. For
geocode.earth, we run planet builds on a 36 core machine, with a total
of 6 importer processes running at once at the start (2 OA, OSM,
polylines, geonames, WOF).

Since the bulk import endpoint already allows importing many records in
parallel (500 by default in this package), 6 importers could lead to up
to 60 bulk requests in flight at once. My guess is even 2-3 bulk
requests is enough to keep Elasticsearch busy.

Eventually I'd like to allow us to configure this option easily across
all importers, but for now lets test this value.

Connects #76
Connects #83
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant