Skip to content

Commit

Permalink
feat(llm): Ollama timeout setting (zylon-ai#1773)
Browse files Browse the repository at this point in the history
* added request_timeout to ollama, default set to 30.0 in settings.yaml and settings-ollama.yaml

* Update settings-ollama.yaml

* Update settings.yaml

* updated settings.py and tidied up settings-ollama-yaml

* feat(UI): Faster startup and document listing (zylon-ai#1763)

* fix(ingest): update script label (zylon-ai#1770)

huggingface -> Hugging Face

* Fix lint errors

---------

Co-authored-by: Stephen Gresham <steve@gresham.id.au>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
  • Loading branch information
3 people authored Mar 20, 2024
1 parent c2d6948 commit 6f6c785
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 5 deletions.
1 change: 1 addition & 0 deletions private_gpt/components/llm/llm_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ def __init__(self, settings: Settings) -> None:
temperature=settings.llm.temperature,
context_window=settings.llm.context_window,
additional_kwargs=settings_kwargs,
request_timeout=ollama_settings.request_timeout,
)
case "azopenai":
try:
Expand Down
4 changes: 4 additions & 0 deletions private_gpt/settings/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,10 @@ class OllamaSettings(BaseModel):
1.1,
description="Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)",
)
request_timeout: float = Field(
120.0,
description="Time elapsed until ollama times out the request. Default is 120s. Format is float. ",
)


class AzureOpenAISettings(BaseModel):
Expand Down
11 changes: 6 additions & 5 deletions settings-ollama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,12 @@ ollama:
llm_model: mistral
embedding_model: nomic-embed-text
api_base: http://localhost:11434
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 0.9 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_last_n: 64 # Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
repeat_penalty: 1.2 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
tfs_z: 1.0 # Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting.
top_k: 40 # Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)
top_p: 0.9 # Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)
repeat_last_n: 64 # Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
repeat_penalty: 1.2 # Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)
request_timeout: 120.0 # Time elapsed until ollama times out the request. Default is 120s. Format is float.

vectorstore:
database: qdrant
Expand Down
1 change: 1 addition & 0 deletions settings.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ ollama:
llm_model: llama2
embedding_model: nomic-embed-text
api_base: http://localhost:11434
request_timeout: 120.0

azopenai:
api_key: ${AZ_OPENAI_API_KEY:}
Expand Down

0 comments on commit 6f6c785

Please sign in to comment.