-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement task-specific validation and setting default prompts if none provided #936
base: main
Are you sure you want to change the base?
Implement task-specific validation and setting default prompts if none provided #936
Conversation
…ssue-921-task-as-translation-creating-experiment
…ssue-921-task-as-translation-creating-experiment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking here, I don't think I have an opinion yet, but should this code be in the jobs.py schema file instead of a new file? The structure of lumigator_schemas thus far is a schema matched to a route. A new file would be a departure from that design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I started with the same logic 😄 i.e. defining TaskType
within jobs.py schema but then ran into circular imports issue.
So I created the new file, but not sure if that was the best solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice re-organization! I have a few questions mostly about where to move the logic.
|
||
|
||
class SummarizationValidator(TaskValidator): | ||
DEFAULT_PROMPT: str = "You are a helpful assistant, expert in text summarization. For every prompt you receive, provide a summary of its contents in at most two sentences." # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk, if we're moving this someplace, should this validation go into the inference job schema instead of lumigator_schema? Might make more sense to move this closer to the logic that is running the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be closer to the logic running the model. However. it might be good to catch and throw the error early on - rather than having to wait till reaching the jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda agree with both here :-)
On the one hand, a job which does not validate its own inputs opens itself to possible issues.
On the other hand, our inference job is kinda agnostic of its purpose and having task-specific validation here means bringing much more logic than is needed to "just run inference".
Also, I like the principle of breaking early rather than within the job: this allows us to get higher-quality errors directly in the client via the API.
For the above reasons I'd lean more towards keeping stuff where it is now rather than with jobs. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, fine with me! No strong opinion from me here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from enum import Enum | ||
|
||
|
||
class TaskType(str, Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's great that we have a schema specifically for tasks with ad-hoc validators, thank you!
WDYT about being explicit here (e.g. with a comment) saying that the task names matches the HF ones, so if a HF model is called directly we'll use the right pipeline? Just as a reference for people who might update this code later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The less translation across dataclasses, the better ^_^
def set_default_prompt(self, config): | ||
# We set the default prompt only if the user has not provided one | ||
if config.system_prompt is None: | ||
config.system_prompt = self.DEFAULT_PROMPT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with HF seq2seq models which do not require a prompt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, current it isn't being used. Also this iteration won't support HF seq2seq models, but only causal LLM API models.
But in future, we will need the language pairs to be used as a "prefix" to every input sequence for seq2seq models. So in a later iteration, we could try to use construct a prefix out of it for seq2seq models and as prompt for causal models.
From https://huggingface.co/docs/transformers/en/tasks/translation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue to track it: #976
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I think as long as this does not break current s2s models doing summarization this is ok (I guess they will just be assigned a prompt which is not used by the summarization pipeline)
@@ -11,15 +12,15 @@ class ExperimentCreate(BaseModel): | |||
description: str = "" | |||
dataset: UUID | |||
max_samples: int = -1 # set to all samples by default | |||
task: str | None = "summarization" | |||
task: TaskType = Field(default=TaskType.SUMMARIZATION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default task type is summarization, shouldn't the default prompt also be the one for summarization?
Also, maybe we can use per-task defaults instead of a single default (we can take a look at a way to encode that in pydantic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, default prompt setting per task happens in jobs here since prompt is available at the job level and not at the experiment level.
What's changing
Refs #921
How to test it
(currently HF models are not fully supported, you have to test with LiteLLM models, i.e., causal LLM API models (e.g. OpenAI)
Steps to test the changes:
For example with summarization:
This would run an inference job with the default prompt:
For example with translation:
This would run an inference job with the default prompt with source and target languages:
Additional notes for reviewers
Anything you'd like to add to help the reviewer understand the changes you're proposing.
I already...
/docs
)