Local model error #100

lzw-lzw · 2023-11-12T13:11:44Z

Hello, thank you for the excellent framework. When I tried to run the local model following the tutorial, I encountered the following problem：ValueError: llama-2-7b-chat-hf is not registered. Please register with the .register("llama-2-7b-chat-hf") method provided in LLMRegistry registry. What could be the reason for this? Thanks.

chenweize1998 · 2023-11-12T13:27:01Z

Are you using the latest code? Did you set llm_type: local to all the agents in your config?

lzw-lzw · 2023-11-12T13:29:15Z

I am using the latest code, and after setting llm_type to local, a new error appears： KeyError: 'Could not automatically map llama-2-7b-chat-hf to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.'

chenweize1998 · 2023-11-12T14:30:40Z

I think the problem should have been fixed in a previous commit. Are you using the latest code on github repo? And did you install AgentVerse with pip install -e .?

lzw-lzw · 2023-11-12T14:48:09Z

Thanks for your patient reply. I am using the the latest code on github repo, and I also install AgentVerse with pip install -e . .After that, I change the MODEL_PATH and MODEL_NAME to the path of my llama-2-7b-chat-hf and "llama-2-7b-chat-hf", and then I run the run_local_model_server.sh. After that, I created a directory under brainstorming, which contains a config.yaml, with llm_type:local and model: llama-2-7b-chat-hf, then I run "python3 agentverse_command/main_tasksolving_cli.py --task tasksolving/brainstorming/llama-2-7b-chat-hf", this error occurred.

chenweize1998 · 2023-11-12T15:20:22Z

Just made some updates to the code. Please check if it's working correctly now. At the moment, I don't have access to a machine with a GPU, so I'm unable to fully run the process with local LLM. If the issue persists, I'll try to find a GPU machine for further debugging.

lzw-lzw · 2023-11-12T15:51:21Z

I'm sorry it still doesn't work properly, waiting for your try. Thanks!

xymou · 2023-11-13T03:04:47Z

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated.

chenweize1998 · 2023-11-13T06:31:09Z

Pull the latest code and try again. It works fine on my GPU machine now. After launching the FastChat service, check if it's running correctly by executing this command curl http://127.0.0.1:5000/v1/models-2-7b-chat-hf. It should returns something like

{"object":"list","data":[{"id":"llama-2-7b-chat-hf","object":"model","created":1699856748,"owned_by":"fastchat","root":"llama-2-7b-chat-hf","parent":null,"permission":[{"id":"modelperm-7bcKCjaRGuVKoeajAXkSgP","object":"model_permission","created":1699856748,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":true,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

After confirming the service is running, run the benchmark script with the following command:
python agentverse_command/benchmark.py --task tasksolving/commongen/llama-2-7b-chat-hf --dataset_path data/commongen/commongen_hard.jsonl

This should execute successfully. However, please note that while the script should run, we cannot guarantee its performance as open-sourced LLMs generally lack behind OpenAI's GPTs.

chenweize1998 · 2023-11-13T06:37:50Z

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated.

@xymou The issue you're encountering might be due to the local model not adhering to the specific response format we've set. In the NLP classroom example, we enforce a strict format where the model's output should be structured as follows:

Action: [specific action]
Action Input: [related input]

OpenAI's GPT models usually comply with this format quite reliably. However, local LLMs might have difficulty consistently generating responses in this precise structure. Oour system is designed to automatically retry if it doesn't detect the required pattern in the model's response. This automatic retry mechanism could explain why you're noticing the prompt being repeated.

xymou · 2023-11-13T11:47:04Z

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated.

@xymou The issue you're encountering might be due to the local model not adhering to the specific response format we've set. In the NLP classroom example, we enforce a strict format where the model's output should be structured as follows:
Action: [specific action]
Action Input: [related input]
OpenAI's GPT models usually comply with this format quite reliably. However, local LLMs might have difficulty consistently generating responses in this precise structure. Oour system is designed to automatically retry if it doesn't detect the required pattern in the model's response. This automatic retry mechanism could explain why you're noticing the prompt being repeated.

Thank you for your reply! I've noticed that the open-source LLMs do not follow the instructions to generate reponses in required structure. Do you have any suggestions to solve this? e.g., giving the models an in-context example? But I guess the input length may be a restriction.😰

chenweize1998 · 2023-11-14T02:24:23Z

I can get it running locally, but the output seems incorrect. The first prompt keeps repeating, no response is generated.

@xymou The issue you're encountering might be due to the local model not adhering to the specific response format we've set. In the NLP classroom example, we enforce a strict format where the model's output should be structured as follows:
Action: [specific action]
Action Input: [related input]
OpenAI's GPT models usually comply with this format quite reliably. However, local LLMs might have difficulty consistently generating responses in this precise structure. Oour system is designed to automatically retry if it doesn't detect the required pattern in the model's response. This automatic retry mechanism could explain why you're noticing the prompt being repeated.
Thank you for your reply! I've noticed that the open-source LLMs do not follow the instructions to generate reponses in required structure. Do you have any suggestions to solve this? e.g., giving the models an in-context example? But I guess the input length may be a restriction.😰

A workaround may be using the constrained generation, e.g., outlines. But we don't support it yet. You may need to investigate and make some code edition.

lzw-lzw · 2023-11-14T06:46:37Z

It's working fine now, thanks for your patient reply.

chenweize1998 added a commit that referenced this issue Nov 12, 2023

fix: encoding for local llm. #100

abacf5a

chenweize1998 added a commit that referenced this issue Nov 13, 2023

fix: local llm support #100

c16ee0a

chenweize1998 closed this as completed Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local model error #100

Local model error #100

lzw-lzw commented Nov 12, 2023

chenweize1998 commented Nov 12, 2023

lzw-lzw commented Nov 12, 2023

chenweize1998 commented Nov 12, 2023

lzw-lzw commented Nov 12, 2023

chenweize1998 commented Nov 12, 2023

lzw-lzw commented Nov 12, 2023

xymou commented Nov 13, 2023

chenweize1998 commented Nov 13, 2023

chenweize1998 commented Nov 13, 2023 •

edited

Loading

xymou commented Nov 13, 2023

chenweize1998 commented Nov 14, 2023

lzw-lzw commented Nov 14, 2023

Local model error #100

Local model error #100

Comments

lzw-lzw commented Nov 12, 2023

chenweize1998 commented Nov 12, 2023

lzw-lzw commented Nov 12, 2023

chenweize1998 commented Nov 12, 2023

lzw-lzw commented Nov 12, 2023

chenweize1998 commented Nov 12, 2023

lzw-lzw commented Nov 12, 2023

xymou commented Nov 13, 2023

chenweize1998 commented Nov 13, 2023

chenweize1998 commented Nov 13, 2023 • edited Loading

xymou commented Nov 13, 2023

chenweize1998 commented Nov 14, 2023

lzw-lzw commented Nov 14, 2023

chenweize1998 commented Nov 13, 2023 •

edited

Loading