Add support for fine-tuned models in encoding_for_model #135

thespino · 2023-05-23T18:48:41Z

Issue

When trying to call encoding_for_model providing a fine-tuned model as input, the following error occurs:

KeyError: 'Could not automatically map davinci:ft-personal:finetunedmodel-2023-05-23-20-00-00 to a tokeniser. Please use `tiktok.get_encoding` to explicitly get the tokeniser you expect.'

Analysis

See https://platform.openai.com/docs/models/model-endpoint-compatibility
See https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

The following models are allowed for fine-tuning:

davinci
curie
babbage
ada

All of them use the encoding r50k_base.

Fine-tuned models names always follow this format:
model:ft-personal:name:date
where

model is the base model from which the fine-tuned one has been created
ft-personal is a fixed string that tells that the model is fine-tuned
name is a custom name that the user can give to the new model
date is the date of fine-tuning in the format yyyy-MM-dd-hh-mm-ss

Solutions

Map the models prefixes in MODEL_PREFIX_TO_ENCODING, so that when encoding_for_model calls model_name.startswith, it can also identify all models starting with "davinci", "ada", etc... and, so, identify fine-tuned models.

byrnehollander · 2023-05-25T17:53:59Z

Thanks for opening this PR @thespino – I've also been running into this issue and am eager to have this released

cc @hauntsaninja

Identify models that can be fine-tuned in encoding_for_model. - See https://platform.openai.com/docs/models/model-endpoint-compatibility - See https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

thespino · 2023-06-01T21:07:21Z

Rebased & synced with main branch

thespino added 2 commits June 1, 2023 23:06

Tests for fine-tuned models in encoding_for_model

96b2158

thespino force-pushed the main branch from feb1ac0 to 96b2158 Compare June 1, 2023 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for fine-tuned models in encoding_for_model #135

Add support for fine-tuned models in encoding_for_model #135

thespino commented May 23, 2023

byrnehollander commented May 25, 2023

thespino commented Jun 1, 2023

Add support for fine-tuned models in encoding_for_model #135

Are you sure you want to change the base?

Add support for fine-tuned models in encoding_for_model #135

Conversation

thespino commented May 23, 2023

Issue

Analysis

Solutions

byrnehollander commented May 25, 2023

thespino commented Jun 1, 2023