Check model and model parser ids strings for spaces, strip them #1114

rossdanlm · 2024-02-02T09:40:38Z

Check model and model parser ids strings for spaces, strip them

Tanya noticed a bug when we accidentally added "field " with a space instead of "field", so trimming the value.

I originally tried to fix this in the editor config itself in SettingsPropertyRenderer, but that was complicated since it meant that this would reset the string value state for the string, and you couldn't keep typing empty spaces at the begininng or end of strings otherwise it would keep trimming them. Therefore, I decided to make this change in all places where we set the model names or fields for main SDK. This may seem a bit controversial (ex: you now can't have two separate models called " my_model " vs. "my_model" but I feel this is fine

Note: I changed both the getter and setter methods, so this will be a breaking change for existing AIConfigs that have leading or trailing whitespace in the (model)/(model parser) names

Test Plan

aiconfig_path=./cookbooks/Gradio/huggingface.aiconfig.json
parsers_path=./cookbooks/Gradio/aiconfig_model_registry.py
aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path

It now works with leading or trailing whitespace

098c4571-0b4f-455d-bc73-7f9dd1cae676.mp4

Stack created with Sapling. Best reviewed with ReviewStack.

rholinshead · 2024-02-02T20:36:21Z

I originally tried to fix this in the editor config itself in SettingsPropertyRenderer, but that was complicated since it meant that this would reset the string value state for the string, and you couldn't keep typing empty spaces at the begininng or end of strings otherwise it would keep trimming them.

Couldn't it just be done in onUpdatePromptModel in AIConfigEditor? Just trim the newModel before dispatch/request

This may seem a bit controversial (ex: you now can't have two separate models called " my_model " vs. "my_model" but I feel this is fine

I do agree, I don't think leading/trailing whitespace should be part of the model name since that's asking for trouble in any other place that trims things.

python/src/aiconfig/schema.py

rholinshead · 2024-02-02T20:44:27Z

python/src/aiconfig/schema.py

@@ -884,6 +906,7 @@ def _update_model_for_aiconfig(
 `update_model` function with a specified `prompt_name` argument.
 """
 warnings.warn(warning_message)
+ model_name = model_name.strip()


If a config already has model name with leading or trailing space here, it can no longer have settings set by this?

Since aiconfig-level model settings can have more than a single model, if we change the model name, we have no idea what the "previous" model name used to be so will error if we're only trying to pass in settings. Therefore, if you change the model name and don't pass in settings argument, it will just create a new model with empty settings. I wrote a comment about this in

aiconfig/python/src/aiconfig/schema.py

Lines 784 to 788 in 43c7f9e

2) Update the settings at the AIConfig-level \

Fix: You must pass in a `name` for the model you wish \

to update. AIConfig-level can have multiple models, \

so without a model name, we don't know which model \

to set the settings for."

rholinshead

I think the reasoning is right, but this is a lot of places where we're stripping the whitespace. Is there some minimum that we can do without needing to do it everywhere?

rholinshead · 2024-02-02T20:46:12Z

python/src/aiconfig/registry.py

@@ -61,7 +61,7 @@ def get_model_parser(model_id: str) -> ModelParser:
 Returns:
 ModelParser: The retrieved model parser
 """
- return ModelParserRegistry._parsers[model_id]
+ return ModelParserRegistry._parsers[model_id.strip()]


For example, here is a "get" instead of set, so we probably don't need to change this?

The problem is that this function get_model_parser() is an external public call, so it's possible for someone to call into it with unsanitized trailing and leading whitespace

Doesn't that contradict the PR summary, though:

Because we could have existing configs that have model names already, I only changed the setter methods, not the getter ones like get_model(), so this is not a breaking change

Yea you're right, let me re-word the the summary.

rossdanlm · 2024-02-03T00:26:07Z

Couldn't it just be done in onUpdatePromptModel in AIConfigEditor? Just trim the newModel before dispatch/request

It's a bit more complicated than this, because we get the model_name from the prompt when calling config.run() in

aiconfig/python/src/aiconfig/Config.py

Lines 322 to 333 in c571904

 model_provider = AIConfigRuntime.get_model_parser(model_name) 

 # Clear previous run outputs if they exist 

 self.delete_output(prompt_name) 

 response = await model_provider.run( 

 prompt_data, 

 self, 

 options, 

 params, 

 **kwargs, # TODO: We should remove and make argument explicit 

 )

~~So in order to prevent breaking changes to existing AIConfigs which may have spaces already defined in that config and not trimmed, we need to continue using those ones~~.

Ok nm, I realize that since we already strip the model name in register_model_parser(), which is what we call every time we do config.create() or config.load() then we're fine.

However I still feel it's good to save it in the SDK so that this is centralized and not reliant on the editor client logic. Ex: if someone manually calls config.update_model(" my_model ", None, "prompt_name") it will still work

rossdanlm · 2024-02-03T00:29:22Z

Is there some minimum that we can do without needing to do it everywhere?

I think that for future nit, we can call sanitize_model_name before every centralized and public SDK method involving model_name. Created task for it in #1126

[ez] Export missing default model parsers Now we can do `from aiconfig import ClaudeBedrockModelParser` for example. I also deleted the `hf.py` file since we now have `aiconfig_extension_hugging_face` for this --- Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/lastmile-ai/aiconfig/pull/1112). * #1114 * __->__ #1112

rholinshead · 2024-02-05T15:55:15Z

python/src/aiconfig/registry.py

@@ -74,7 +74,7 @@ def get_model_parser_for_prompt(prompt: Prompt, config: "AIConfigRuntime"):
 Returns:
 ModelParser: The retrieved model parser
 """
- model_name = config.get_model_name(prompt)
+ model_name = config.get_model_name(prompt).strip()


I think we should just leave this as-is. If it's in the config already then we should have either sanitized it or it was before we started sanitizing it and stripping is going to make it inaccessible.

Because we could have existing configs that have model names already, I only changed the setter methods, not the getter ones like get_model(), so this is not a breaking change

If we do something like this:

model_name = " my_model " config.update_model(model_name, None) # does some stuff... config.get_model_parser_for_prompt(model_name)

This will fail if we don't do the strip, so I think it's important to keep it, even though yea it'll be a breaking change. I think making the breaking change is better than breaking across the same session. I'll re-word the summary

Made an issue to surface unfound model/model parser names in #1136 similar to what we have for prompt error message

Can you explain why this will fail? With the other changes in this PR, config.get_model_name(prompt) should not have any extra whitespace already, so no need to strip it, right?

Tanya noticed a bug when we accidentally added "field " with a space instead of "field", so trimming the value. I originally tried to fix this in the editor config itself in `SettingsPropertyRenderer`, but that was complicated since it meant that this would reset the string value state for the string, and you couldn't keep typing empty spaces at the begininng or end of strings otherwise it would keep trimming them. Therefore, I decided to make this change in all places where we set the model names or fields for main SDK. This may seem a bit controversial (ex: you now can't have two separate models called " my_model " vs. "my_model" but I feel this is fine Note: I changed both the getter and setter methods, so this will be a breaking change for existing AIConfigs that have leading or trailing whitespace in the (model)/(model parser) names ## Test Plan ``` aiconfig_path=./cookbooks/Gradio/huggingface.aiconfig.json parsers_path=./cookbooks/Gradio/aiconfig_model_registry.py aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path ``` It now works with leading or trailing whitespace https://github.com/lastmile-ai/aiconfig/assets/151060367/116b4ef4-b1f9-4a15-9d07-e1fb9d995245

rholinshead · 2024-02-06T22:30:24Z

python/src/aiconfig/schema.py

+ if model.settings is not None and "model" in model.settings:
+ # TODO: Support cases where "model" field under model settings isn't just a str, since it technically can be anything depending on the model parser
+ if isinstance(model.settings["model"], str):
+ model.settings["model"] = model.settings["model"].strip()


Was thinking on this some more. Let's not do this specific change. My reasoning is:

parser/schema level should has no idea what this 'model' means. It's an implementation detail of the parser

similarly, it's the parser's responsibility to use the 'model' from the settings as the 'model' for the inference run (that logic is in the parser, not the schema/core sdk)

So, if we want to strip the settings model it should be done when pulling it from the settings in the relevant parsers

rholinshead · 2024-02-06T22:31:35Z

python/src/aiconfig/schema.py

@@ -856,13 +879,16 @@ def _update_model_settings_for_prompt(
 name=model_name, settings=settings
 )
 else:
+ if "model" in settings:
+ # TODO: Support cases where "model" field under model settings isn't just a str, since it technically can be anything depending on the model parser 


Same here -- this is parser-specific logic

rholinshead

Ok, I think most places look good. Requesting changes because I don't think we should be touching 'model' value in the inference settings from the core/sdk code (it's parser implementation detail).
Also still not sure the strip is needed in get_model_parser_for_prompt but might just not be understanding your example

rossdanlm mentioned this pull request Feb 2, 2024

[ez] Export missing default model parsers #1112

Merged

rossdanlm force-pushed the pr1114 branch from 425fff1 to daecbfb Compare February 2, 2024 09:41

rossdanlm marked this pull request as ready for review February 2, 2024 09:43

rossdanlm requested review from saqadri, rholinshead, suyoglastmileai, Ankush-lastmile and jonathanlastmileai as code owners February 2, 2024 09:43

rossdanlm force-pushed the pr1114 branch 2 times, most recently from ad0a247 to 43c7f9e Compare February 2, 2024 10:01

rossdanlm mentioned this pull request Feb 2, 2024

[ez] Autoformatting #1115

Open

rholinshead reviewed Feb 2, 2024

View reviewed changes

python/src/aiconfig/schema.py Outdated Show resolved Hide resolved

rholinshead reviewed Feb 2, 2024

View reviewed changes

python/src/aiconfig/schema.py Outdated Show resolved Hide resolved

rholinshead reviewed Feb 2, 2024

View reviewed changes

rossdanlm marked this pull request as draft February 3, 2024 00:16

rossdanlm mentioned this pull request Feb 3, 2024

Create sanitized helper function for SDK calls that take model_name argument #1126

Open

rossdanlm force-pushed the pr1114 branch 2 times, most recently from d986556 to 2e96c38 Compare February 3, 2024 00:45

rossdanlm marked this pull request as ready for review February 3, 2024 00:45

rholinshead reviewed Feb 5, 2024

View reviewed changes

rossdanlm force-pushed the pr1114 branch from 2e96c38 to 24c6cff Compare February 5, 2024 17:42

rossdanlm mentioned this pull request Feb 5, 2024

Add error message for unfound model name #1136

Closed

rholinshead reviewed Feb 6, 2024

View reviewed changes

rholinshead requested changes Feb 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check model and model parser ids strings for spaces, strip them #1114

Check model and model parser ids strings for spaces, strip them #1114

rossdanlm commented Feb 2, 2024 •

edited

Loading

rholinshead commented Feb 2, 2024

rholinshead Feb 2, 2024

rossdanlm Feb 3, 2024

rholinshead left a comment

rholinshead Feb 2, 2024

rossdanlm Feb 3, 2024

rholinshead Feb 5, 2024

rossdanlm Feb 5, 2024

rossdanlm commented Feb 3, 2024

rossdanlm commented Feb 3, 2024

rholinshead Feb 5, 2024

rossdanlm Feb 5, 2024 •

edited

Loading

rossdanlm Feb 5, 2024

rholinshead Feb 6, 2024

rholinshead Feb 6, 2024

rholinshead Feb 6, 2024

rholinshead left a comment

	2) Update the settings at the AIConfig-level \
	Fix: You must pass in a `name` for the model you wish \
	to update. AIConfig-level can have multiple models, \
	so without a model name, we don't know which model \
	to set the settings for."

Check model and model parser ids strings for spaces, strip them #1114

Are you sure you want to change the base?

Check model and model parser ids strings for spaces, strip them #1114

Conversation

rossdanlm commented Feb 2, 2024 • edited Loading

Test Plan

rholinshead commented Feb 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rholinshead left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossdanlm commented Feb 3, 2024

rossdanlm commented Feb 3, 2024

Choose a reason for hiding this comment

rossdanlm Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rholinshead left a comment

Choose a reason for hiding this comment

rossdanlm commented Feb 2, 2024 •

edited

Loading

rossdanlm Feb 5, 2024 •

edited

Loading