Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement task-specific validation and setting default prompts if none provided #936

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

HareeshBahuleyan
Copy link
Contributor

@HareeshBahuleyan HareeshBahuleyan commented Feb 19, 2025

What's changing

  • Treat task as Enum TaskType instead of str
  • Validate task <-> source_language and target_language (no language pair should be specified for summarization, but both source_language and target_language required for translation)
  • Setting default prompt at JobInferenceConfig: for summarization use DEFAULT_SUMMARIZER_PROMPT, for translation set default prompt based on language pair
  • ^Fix current hardcoded system prompt

If this PR is related to an issue or closes one, please link it here.

Refs #921

How to test it

(currently HF models are not fully supported, you have to test with LiteLLM models, i.e., causal LLM API models (e.g. OpenAI)

Steps to test the changes:

  1. make local-up
  2. Upload sample dataset - use this for translation en-de and dialog_sum for summarization
  3. Run inference job (try different combinations of task and source_language and target_language).

For example with summarization:

curl -X 'POST' \
  'http://localhost:8000/api/v1/jobs/inference/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "dataset": "e05ecaae-0840-48f2-afef-9417a02aacdd",
  "max_samples": -1,
  "job_config": {
    "job_type": "inference",
    "model": "gpt-4o-mini",
    "provider": "openai",
    "task": "summarization"
  }
}'

This would run an inference job with the default prompt:

python inference.py --config '{"name":"string/1f7a2caa-ddf7-4172-8002-34dfb84707f2","dataset":{"path":"s3://lumigator-storage/datasets/e05ecaae-0840-48f2-afef-9417a02aacdd/dialogsum_exc.csv"},"job":{"max_samples":-1,"storage_path":"s3://lumigator-storage/jobs/results/","output_field":"predictions"},"inference_server":{"model":"gpt-4o-mini","provider":"openai","system_prompt":"You are a helpful assistant, expert in text summarization. For every prompt you receive, provide a summary of its contents in at most two sentences.","max_retries":3},"params":{"max_tokens":1024,"frequency_penalty":0.0,"temperature":1.0,"top_p":1.0}}'

For example with translation:

curl -X 'POST' \
  'http://localhost:8000/api/v1/jobs/inference/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "dataset": "8ffa0cc3-736f-4071-8139-88cab30229ae",
  "max_samples": -1,
  "job_config": {
    "job_type": "inference",
    "model": "gpt-4o-mini",
    "provider": "openai",
    "task": "translation",
    "source_language": "en",
    "target_language": "de"
  }
}'

This would run an inference job with the default prompt with source and target languages:

python inference.py --config '{"name":"string/96906703-a273-4435-afce-73e5a8d3312c","dataset":{"path":"s3://lumigator-storage/datasets/8ffa0cc3-736f-4071-8139-88cab30229ae/sample_translation_en_de.csv"},"job":{"max_samples":-1,"storage_path":"s3://lumigator-storage/jobs/results/","output_field":"predictions"},"inference_server":{"model":"gpt-4o-mini","provider":"openai","system_prompt":"translate en to de:","max_retries":3},"params":{"max_tokens":1024,"frequency_penalty":0.0,"temperature":1.0,"top_p":1.0}}'...{'pip': '/mzai/lumigator/jobs/inference/requirements.txt', 'working_dir': '/mzai/lumigator/jobs/inference', 'env_vars': {'MZAI_JOB_ID': '96906703-a273-4435-afce-73e5a8d3312c'}}

Additional notes for reviewers

Anything you'd like to add to help the reviewer understand the changes you're proposing.

I already...

  • Tested the changes in a working environment to ensure they work as expected
  • Added some tests for any new functionality
  • Updated the documentation (both comments in code and product documentation under /docs)
  • Checked if a (backend) DB migration step was required and included it if required

@github-actions github-actions bot added backend schemas Changes to schemas (which may be public facing) labels Feb 19, 2025
@HareeshBahuleyan HareeshBahuleyan changed the title Create experiment and run workflow with task as translation Implement task-specific validation and setting default prompts for translation/summarization if none provided Feb 21, 2025
@HareeshBahuleyan HareeshBahuleyan changed the title Implement task-specific validation and setting default prompts for translation/summarization if none provided Implement task-specific validation and setting default prompts if none provided Feb 21, 2025
@HareeshBahuleyan HareeshBahuleyan marked this pull request as ready for review February 21, 2025 10:19
@HareeshBahuleyan HareeshBahuleyan self-assigned this Feb 21, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking here, I don't think I have an opinion yet, but should this code be in the jobs.py schema file instead of a new file? The structure of lumigator_schemas thus far is a schema matched to a route. A new file would be a departure from that design

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I started with the same logic 😄 i.e. defining TaskType within jobs.py schema but then ran into circular imports issue.

So I created the new file, but not sure if that was the best solution.

Copy link
Contributor

@njbrake njbrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice re-organization! I have a few questions mostly about where to move the logic.



class SummarizationValidator(TaskValidator):
DEFAULT_PROMPT: str = "You are a helpful assistant, expert in text summarization. For every prompt you receive, provide a summary of its contents in at most two sentences." # noqa: E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk, if we're moving this someplace, should this validation go into the inference job schema instead of lumigator_schema? Might make more sense to move this closer to the logic that is running the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be closer to the logic running the model. However. it might be good to catch and throw the error early on - rather than having to wait till reaching the jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree with both here :-)
On the one hand, a job which does not validate its own inputs opens itself to possible issues.
On the other hand, our inference job is kinda agnostic of its purpose and having task-specific validation here means bringing much more logic than is needed to "just run inference".
Also, I like the principle of breaking early rather than within the job: this allows us to get higher-quality errors directly in the client via the API.
For the above reasons I'd lean more towards keeping stuff where it is now rather than with jobs. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, fine with me! No strong opinion from me here :)

Copy link
Contributor

@javiermtorres javiermtorres Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than having to wait till reaching the jobs.

the jobs schemas should be reachable from the backend. Please wait for #888
@aittalam the generate_config is implemented by each job, so the backend is kept agnostic while allowing per-job checks at the same time 👍

from enum import Enum


class TaskType(str, Enum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great that we have a schema specifically for tasks with ad-hoc validators, thank you!
WDYT about being explicit here (e.g. with a comment) saying that the task names matches the HF ones, so if a HF model is called directly we'll use the right pipeline? Just as a reference for people who might update this code later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The less translation across dataclasses, the better ^_^

def set_default_prompt(self, config):
# We set the default prompt only if the user has not provided one
if config.system_prompt is None:
config.system_prompt = self.DEFAULT_PROMPT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with HF seq2seq models which do not require a prompt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, current it isn't being used. Also this iteration won't support HF seq2seq models, but only causal LLM API models.
But in future, we will need the language pairs to be used as a "prefix" to every input sequence for seq2seq models. So in a later iteration, we could try to use construct a prefix out of it for seq2seq models and as prompt for causal models.

From https://huggingface.co/docs/transformers/en/tasks/translation:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track it: #976

Copy link
Member

@aittalam aittalam Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I think as long as this does not break current s2s models doing summarization this is ok (I guess they will just be assigned a prompt which is not used by the summarization pipeline)

@@ -11,15 +12,15 @@ class ExperimentCreate(BaseModel):
description: str = ""
dataset: UUID
max_samples: int = -1 # set to all samples by default
task: str | None = "summarization"
task: TaskType = Field(default=TaskType.SUMMARIZATION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default task type is summarization, shouldn't the default prompt also be the one for summarization?
Also, maybe we can use per-task defaults instead of a single default (we can take a look at a way to encode that in pydantic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, default prompt setting per task happens in jobs here since prompt is available at the job level and not at the experiment level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend schemas Changes to schemas (which may be public facing)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Users can specify the task as ‘translation’ when creating an experiment
4 participants