Adding `/get_tokenizer` to api_server for lm-evaluation-harness ease integration. #2643

AguirreNicolas · 2024-01-29T05:53:03Z

OpenAI is already supporting logprobs for chat models in most model cases.
In paralell, lm-evaluation-harness is in dev to add support to this feature.

In order to run a evaluation of an endpoint backended by vLLM, and similarly as in the openai native case using tiktoken, the tokenizer will be needed.

The simple way would be to return the HF <organization>/<repo> string, and then relay on the use of AutoTokenizer.from_pretrained.
But given that some usecase like:

models/tokenizer can be private, or
lm-evaluation-harness and vllm can be in the same network but without internet access.

I propose to add the get_tokenizer endpoit the api_server so then it can be used like:

# Get tokenizer
response = requests.get("http://0.0.0.0:8000/get_tokenizer")
tokenizer_objects = json.loads(response.content)
# save into files
for key, value in tokenizer_objects.items():
    # create folder if not exists
    if not os.path.exists(TOKENIZER_EPHIMERAL_PATH):
        os.mkdir(TOKENIZER_EPHIMERAL_PATH)
    with open(os.path.join(TOKENIZER_EPHIMERAL_PATH, key + ".json"), "w") as f:
        json.dump(value, f)
        f.close()
try:
    shutil.rmtree(TOKENIZER_EPHIMERAL_PATH)
    #logger.info(f"Ephimeral '{TOKENIZER_EPHIMERAL_PATH.name}' directory removed successfully.")
    #logger.info(f"Tokenizer objects availables: {str(self.tokenizer_objects.keys())}")
except OSError as e:
    raise RuntimeError(f"Error removing '{TOKENIZER_EPHIMERAL_PATH.name}' directory: {e}")

# load tokenizer
vllm_tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_EPHIMERAL_PATH)

The next step would be to have logprobs supported in vLLM for chat cases in the OAI entryppoint, and then a PR on lm-evaluation-harness to catch the vLLM tokenizer feature (like using the code above).

simon-mo · 2024-02-26T21:14:44Z

In these cases

models/tokenizer can be private, or
lm-evaluation-harness and vllm can be in the same network but without internet access.

wouldn't the user of lm-evaluation-harness be able to figure out the tokenizer ahead of time?

simon-mo · 2024-02-26T21:15:08Z

I would like to wait a bit on this PR to see how popular this request it. Thank you for posting! @AguirreNicolas

AguirreNicolas · 2024-04-30T05:06:33Z

Hi @simon-mo ,

My team and I want to provide support for lm-eval-harness so that tests can be run on a vLLM endpoint without known the model a priori.

That is, to use lm-eval-harness as a client which wants to evaluate the capability of a model served through vLLM.

The issue we are encountering is in OpenaiCompletionsLM

Under the current lm-eval-harness concept, the tokenizer is available via huggingface or using the OpenAI tokenizer.

For this reason, we believe it would be valuable to move forward with the current PR. Moreover, I was thinking of updating the current PR adding a boolean option on the vLLM side to enable or disable this option.

Finally, open another PR in lm-eval-harness updating the OpenaiCompletionsLM by adding:

        elif self.tokenizer_backend == "vllm":
            import transformers  # noqa: E401

            # Get tokenizer
            response = requests.get(base_url+"/get_tokenizer")
            tokenizer_objects = json.loads(response.content)    
            # save into files
            for key, value in tokenizer_objects.items():
                # create folder if not exists
                if not os.path.exists(self.tokenizer_ephimeral_path):
                    os.mkdir(self.tokenizer_ephimeral_path)
                with open(os.path.join(self.tokenizer_ephimeral_path, key + ".json"), "w") as f:
                    json.dump(value, f)
                    f.close()
            try:
                shutil.rmtree(self.tokenizer_ephimeral_path)
                eval_logger.debug(f"Ephimeral '{self.tokenizer_ephimeral_path.name}' directory removed successfully.")
                eval_logger.debug(f"Tokenizer objects availables: {str(self.tokenizer_objects.keys())}")
            except OSError as e:
                raise RuntimeError(f"Error removing '{self.tokenizer_ephimeral_path.name}' directory: {e}")
            # load tokenizer
            self.tokenizer = transformers.AutoTokenizer.from_pretrained(self.tokenizer_ephimeral_path)            

            self.vocab_size = self.tokenizer.vocab
            self.end_of_text_token_id = self.tokenizer.eos_token

What do you think? Would you consider in any way valuable our time into vLLM on this issue ?

format format

github-actions · 2024-10-30T02:03:05Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2024-10-30T02:03:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. @AguirreNicolas please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

AguirreNicolas mentioned this pull request Feb 1, 2024

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed

30 tasks

AguirreNicolas force-pushed the get_remote_tokenizers branch from cfa4b85 to f89a2b1 Compare April 26, 2024 08:13

AguirreNicolas added 4 commits April 30, 2024 12:47

get_tokenizer

d3ae686

format format

format fix

72b7928

fix Cached tokenizer to allow AutoTokenizer.from_pretrained()

d889399

fix get_lora_tokenizer_async

3f3d7fc

AguirreNicolas force-pushed the get_remote_tokenizers branch from 8fe1af7 to 3f3d7fc Compare April 30, 2024 15:56

more fixes + format

36521af

This was referenced May 2, 2024

/tokenize - Optionally Apply Chat Template before Tokenization huggingface/text-generation-inference#1706

Closed

Vllm get tokenizer EleutherAI/lm-evaluation-harness#1794

Open

github-actions bot added the stale label Oct 30, 2024

mergify bot added frontend needs-rebase labels Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `/get_tokenizer` to api_server for lm-evaluation-harness ease integration. #2643

Adding `/get_tokenizer` to api_server for lm-evaluation-harness ease integration. #2643

AguirreNicolas commented Jan 29, 2024

simon-mo commented Feb 26, 2024

simon-mo commented Feb 26, 2024

AguirreNicolas commented Apr 30, 2024 •

edited

Loading

github-actions bot commented Oct 30, 2024

mergify bot commented Oct 30, 2024

Adding /get_tokenizer to api_server for lm-evaluation-harness ease integration. #2643

Are you sure you want to change the base?

Adding /get_tokenizer to api_server for lm-evaluation-harness ease integration. #2643

Conversation

AguirreNicolas commented Jan 29, 2024

simon-mo commented Feb 26, 2024

simon-mo commented Feb 26, 2024

AguirreNicolas commented Apr 30, 2024 • edited Loading

github-actions bot commented Oct 30, 2024

mergify bot commented Oct 30, 2024

Adding `/get_tokenizer` to api_server for lm-evaluation-harness ease integration. #2643

Adding `/get_tokenizer` to api_server for lm-evaluation-harness ease integration. #2643

AguirreNicolas commented Apr 30, 2024 •

edited

Loading