Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] add throttling based on configuration parameter #54862

Closed
hendrikmuhs opened this issue Apr 7, 2020 · 5 comments
Closed

[Transform] add throttling based on configuration parameter #54862

hendrikmuhs opened this issue Apr 7, 2020 · 5 comments

Comments

@hendrikmuhs
Copy link

hendrikmuhs commented Apr 7, 2020

Running transform can increase load on the cluster. Especially in batch mode and for the first checkpoint of a continuous transform this can have bad side effects on the cluster.

Transform should support throttling similar to reindex. Throttling is configured in reindex with the setting requests_per_second.

Because transform works as a reducer (reindex is conceptually a mapper), transform should throttle differently:

  • we be base the calculation on the number of input documents
  • most runtime is spend for search/aggregations
  • the number of output documents can differ, even within one transform due to data and configuration

For transform we decided to go for docs_per_second.

Setting docs_per_second to null disables throttling (instead of null you can also use -1), setting docs_per_second to 0 is disallowed.

Configuration:

{
  "source": {...},
  "dest": {...}
  "pivot": {
      "group_by": {...},
      "aggregations"" {...}
  },
  "settings": {
      "max_page_search_size": XX,
      "docs_per_second": XX
  }
}

docs_per_second is located under a new object: settings.

The setting max_page_search_size which is part of pivot will be moved there, too (the old place is deprecated but kept for BWC). If both are specified, the one under settings win, because it could be a leftover after updating an old configuration.

The whole settings object will be update-able with the update API.

Update will be

  • immediate for settings
  • delayed until the next checkpoint for the rest
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

hendrikmuhs pushed a commit that referenced this issue Apr 30, 2020
implement throttling in async-indexer used by rollup and transform. The added
docs_per_second parameter is used to calculate a delay before the next
search request is send. With re-throttle its possible to change the parameter
at runtime. When stopping a running job, its ensured that despite throttling
the indexer stops in reasonable time. This change contains the groundwork, but
does not expose the new functionality.

relates #54862
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Apr 30, 2020
implement throttling in async-indexer used by rollup and transform. The added
docs_per_second parameter is used to calculate a delay before the next
search request is send. With re-throttle its possible to change the parameter
at runtime. When stopping a running job, its ensured that despite throttling
the indexer stops in reasonable time. This change contains the groundwork, but
does not expose the new functionality.

relates elastic#54862
# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupJobTask.java
#	x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/job/RollupIndexerIndexingTests.java
#	x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/job/RollupIndexerStateTests.java
hendrikmuhs pushed a commit that referenced this issue Apr 30, 2020
implement throttling in async-indexer used by rollup and transform. The added
docs_per_second parameter is used to calculate a delay before the next
search request is send. With re-throttle its possible to change the parameter
at runtime. When stopping a running job, its ensured that despite throttling
the indexer stops in reasonable time. This change contains the groundwork, but
does not expose the new functionality.

relates #54862
backport: #55011
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue May 5, 2020
add throttling to transform, throttling will slow down search requests by delaying the execution based on a documents per second metric.

fixes elastic#54862
hendrikmuhs pushed a commit that referenced this issue May 5, 2020
add throttling to transform, throttling will slow down search requests by
delaying the execution based on a documents per second metric.

fixes #54862
@russcam
Copy link
Contributor

russcam commented Jun 2, 2020

Setting docs_per_second to null disables throttling

@hendrikmuhs In other APIs where behaviour can be "disabled", -1 has typically been used as a special value to signal this e.g.

Would there be scope to disable throttling on docs_per_second by setting it to -1?

I can't speak for other language clients, but for the .NET high level client, it typically omits sending null values in a request because a null value for a property on a request type is usually one that hasn't been set by the user. We would need to write additional behaviour to be able to send an explicitly set null value in the request, to be able to distinguish it from a nullable int default value in the docs_per_second case.

Looks like the Java client sends a -1 in the case of null?

/**
* Sets the docs per second that transform can use when pulling the data from the source index.
*
* This setting throttles transform by issuing queries less often, however processing still happens in
* batches. A value of 0 disables throttling (default).
*
* @param docsPerSecond Integer value
* @return the {@link Builder} with requestsPerSecond set.
*/
public Builder setRequestsPerSecond(Float docsPerSecond) {
this.docsPerSecond = docsPerSecond == null ? DEFAULT_DOCS_PER_SECOND : docsPerSecond;
return this;
}

(/cc @elastic/es-clients)

@karmi
Copy link
Contributor

karmi commented Jun 2, 2020

Agreed that null values are hard to implement (and semantically somewhat empty), and that -1, especially for numerical values, makes more sense.

russcam added a commit to elastic/elasticsearch-net that referenced this issue Jun 2, 2020
Relates: #4718, elastic/elasticsearch#54862

Add transform settings to Transform APIs
@hendrikmuhs
Copy link
Author

hendrikmuhs commented Jun 3, 2020

@russcam

Both is possible null or -1. I will update the description above.

In the stack we have both versions, null defaults are used e.g. for cluster settings.

@russcam
Copy link
Contributor

russcam commented Jun 3, 2020

Thanks for clarifying 👍 I'll document to send -1 for the client to disable

russcam added a commit to elastic/elasticsearch-net that referenced this issue Jun 10, 2020
Relates: #4718, elastic/elasticsearch#54862

Add transform settings to Transform APIs
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this issue Jun 10, 2020
Relates: #4718, elastic/elasticsearch#54862

Add transform settings to Transform APIs
github-actions bot pushed a commit to elastic/elasticsearch-net that referenced this issue Jun 10, 2020
Relates: #4718, elastic/elasticsearch#54862

Add transform settings to Transform APIs
russcam added a commit to elastic/elasticsearch-net that referenced this issue Jun 10, 2020
Relates: #4718, elastic/elasticsearch#54862

Add transform settings to Transform APIs

Co-authored-by: Russ Cam <russ.cam@elastic.co>
russcam added a commit to elastic/elasticsearch-net that referenced this issue Jun 10, 2020
Relates: #4718, elastic/elasticsearch#54862

Add transform settings to Transform APIs

Co-authored-by: Russ Cam <russ.cam@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants