Multi-threading #62
-
Is there a way to configure the code to cater to multiple user at the same time, I have server that can support the load however when multiple people try to access the endpoint it goes in a sequential mode and not multi-threaded. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey, @KeenanFernandes2000 ! Yes, it is possible in v.0.1.33 of Ollama. You need to set some environmental variables. Check out this link, specifically under
Example:
|
Beta Was this translation helpful? Give feedback.
Hey, @KeenanFernandes2000 !
Yes, it is possible in v.0.1.33 of Ollama. You need to set some environmental variables. Check out this link, specifically under
Experimental concurrency features
.OLLAMA_NUM_PARALLEL
: Handle multiple requests simultaneously for a single modelOLLAMA_MAX_LOADED_MODELS
: Load multiple models simultaneouslyExample:
OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4