Fix put inference API docs (elastic#110025)

* Fix put inference API docs * Update docs/changelog/110025.yaml * Delete docs/changelog/110025.yaml
astefan · Jun 21, 2024 · 13478b2 · 13478b2
1 parent 47bc0ba
commit 13478b2
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc
@@ -257,11 +257,11 @@ It can be the ID of either a built-in model (for example, `.multilingual-e5-smal
 
 `num_allocations`:::
 (Required, integer)
-The number of model allocations to create. `num_allocations` must not exceed the number of available processors per node divided by the `num_threads`.
+The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
 
 `num_threads`:::
 (Required, integer)
-The number of threads to use by each model allocation. `num_threads` must not exceed the number of available processors per node divided by the number of allocations.
+Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node. 
 Must be a power of 2. Max allowed value is 32.
 
 =====
@@ -272,11 +272,11 @@ Must be a power of 2. Max allowed value is 32.
 
 `num_allocations`:::
 (Required, integer)
-The number of model allocations to create. `num_allocations` must not exceed the number of available processors per node divided by the `num_threads`.
+The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
 
 `num_threads`:::
 (Required, integer)
-The number of threads to use by each model allocation. `num_threads` must not exceed the number of available processors per node divided by the number of allocations.
+Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node. 
 Must be a power of 2. Max allowed value is 32.
 
 =====