Skip to content

Commit

Permalink
Fix put inference API docs (elastic#110025)
Browse files Browse the repository at this point in the history
* Fix put inference API docs

* Update docs/changelog/110025.yaml

* Delete docs/changelog/110025.yaml
  • Loading branch information
jan-elastic authored Jun 21, 2024
1 parent 47bc0ba commit 13478b2
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/reference/inference/put-inference.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -257,11 +257,11 @@ It can be the ID of either a built-in model (for example, `.multilingual-e5-smal
`num_allocations`:::
(Required, integer)
The number of model allocations to create. `num_allocations` must not exceed the number of available processors per node divided by the `num_threads`.
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
`num_threads`:::
(Required, integer)
The number of threads to use by each model allocation. `num_threads` must not exceed the number of available processors per node divided by the number of allocations.
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
Must be a power of 2. Max allowed value is 32.
=====
Expand All @@ -272,11 +272,11 @@ Must be a power of 2. Max allowed value is 32.
`num_allocations`:::
(Required, integer)
The number of model allocations to create. `num_allocations` must not exceed the number of available processors per node divided by the `num_threads`.
The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
`num_threads`:::
(Required, integer)
The number of threads to use by each model allocation. `num_threads` must not exceed the number of available processors per node divided by the number of allocations.
Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
Must be a power of 2. Max allowed value is 32.
=====
Expand Down

0 comments on commit 13478b2

Please sign in to comment.