Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ES bulk processor commit timeout #3696

Merged
merged 5 commits into from
Dec 20, 2022

Conversation

alexshtin
Copy link
Member

@alexshtin alexshtin commented Dec 8, 2022

What changed?

  1. Fix ES bulk processor commit timeout. Effective value was 3s but it supposed to be 60s.
  2. Decrease ES bulk processor commit timeout from 60s to 30s.
  3. Visibility task execution timeout is still 3s and it is not respected by bulk processor which now has 30s commit timeout and may also spend some time waiting for bulk processor to accept the request.
  4. Increased default number of bulk processor workers from 1 to 2. Rationale: it should help with commit latency during traffic spikes and with "wait add" latency. This goes to Increase default number of Elasticsearch bulk processor workers to 2 #3738 instead.
  5. Change the way how elasticsearch_bulk_processor_queued_requests metric is emitted. It now shows number of pending requests after current bulk is processed.

Why?
Fixing bugs, improving overall ES write latency.

How did you test it?
Existing tests.

Potential risks
No risks.

Is hotfix candidate?
Yes.

@alexshtin alexshtin force-pushed the feature/tune-es-bulk-processor branch from d0da87b to a5aa26e Compare December 8, 2022 18:15
@alexshtin alexshtin changed the title Fix ES bulk processor timeout Fix ES bulk processor commit timeout Dec 8, 2022
@alexshtin alexshtin marked this pull request as ready for review December 8, 2022 19:06
@alexshtin alexshtin requested a review from a team as a code owner December 8, 2022 19:06
@alexshtin alexshtin force-pushed the feature/tune-es-bulk-processor branch 2 times, most recently from 261cabf to 51fcc24 Compare December 9, 2022 20:29
Copy link
Collaborator

@rodrigozhou rodrigozhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of nit comments.

Comment on lines 202 to 203
p.metricsHandler.Histogram(metrics.ElasticsearchBulkProcessorQueuedRequests.GetMetricName(), metrics.ElasticsearchBulkProcessorBulkSize.GetMetricUnit()).
Record(int64(p.mapToAckFuture.Len() - len(requests)))
Record(int64(p.mapToAckFuture.Len()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this metric supposed to mean? Number of queued requests?
Wouldn't it make more sense to call this after the queue size changes? Like, in bulkAfterAction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like how many documents are still waiting to be flushed. I like the idea of moving this to bulkAfterAction.

service/history/configs/config.go Outdated Show resolved Hide resolved
@alexshtin alexshtin force-pushed the feature/tune-es-bulk-processor branch 3 times, most recently from 51659ca to 59250e2 Compare December 10, 2022 06:20
@alexshtin alexshtin force-pushed the feature/tune-es-bulk-processor branch from 66dde25 to 20fc1b9 Compare December 20, 2022 09:11
@alexshtin alexshtin added the release/1.19.1 Patches for v1.19.1 label Dec 20, 2022
@alexshtin alexshtin merged commit a7240b4 into temporalio:master Dec 20, 2022
@alexshtin alexshtin deleted the feature/tune-es-bulk-processor branch December 20, 2022 16:18
yycptt pushed a commit that referenced this pull request Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/1.19.1 Patches for v1.19.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants