Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recreated settings changes - Adds serveral options for llamacpp and ollama #1703

Merged
merged 5 commits into from
Mar 11, 2024

Conversation

icsy7867
Copy link
Contributor

Original PR here:
#1677

llama-cpp https://llama-cpp-python.readthedocs.io/en/latest/api-reference/
https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html#

ollama - https://github.com/run-llama/llama_index/blob/eeb2a60387b8ae1994005ad0eebb672ee02074ff/llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py

No configurable changes. -
openailike - https://docs.llamaindex.ai/en/stable/examples/llm/localai.html#localai

Not sure about the model_kwargs. The value is references for openai, but I could not find documentation on what values were allowed.
openai - https://github.com/run-llama/llama_index/blob/eeb2a60387b8ae1994005ad0eebb672ee02074ff/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py
https://docs.llamaindex.ai/en/stable/examples/llm/openai.html

For the text/description I used the values found here:
https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values

LlamaCPP, where it used the same K/V, had the same values. However my setup is currently using ollama, need some testing done for LlamaCPP.

I also added the temperature under the main llm.settings. This should allow the models that supports this value to be edited/changed.

@icsy7867
Copy link
Contributor Author

Hmm small bug... num_predict: 128 doesnt do what I think it does. It tells llamaindex the maximum size of the response. So this should probably be set to -1 or -2 by default.

It is odd though, that the default says "128", but if you dont set that kwarg, you get a larger response.

@icsy7867
Copy link
Contributor Author

Looking at the ollama code:
https://github.com/ollama/ollama/blob/f878e91070af750709f1b3195eeb9fbdcaad2bef/openai/openai.go#L174

	if r.MaxTokens != nil {
		options["num_predict"] = *r.MaxTokens
	}

It looks like the default is 128, unless you have max tokens set. Then it just makes the value the same as the max tokens. Alternatively. setting this to "Max new tokens" might make more sense.

@imartinez imartinez merged commit 02dc83e into zylon-ai:main Mar 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants