Implement task-specific validation and setting default prompts if none provided #936

HareeshBahuleyan · 2025-02-19T16:04:04Z

What's changing

Treat task as Enum TaskType instead of str
Validate task <-> source_language and target_language (no language pair should be specified for summarization, but both source_language and target_language required for translation)
Setting default prompt at JobInferenceConfig: for summarization use DEFAULT_SUMMARIZER_PROMPT, for translation set default prompt based on language pair
^Fix current hardcoded system prompt

If this PR is related to an issue or closes one, please link it here.

How to test it

(currently HF models are not fully supported, you have to test with LiteLLM models, i.e., causal LLM API models (e.g. OpenAI)

Steps to test the changes:

make local-up
Upload sample dataset - use this for translation en-de and dialog_sum for summarization
Run inference job (try different combinations of task and source_language and target_language).

For example with summarization:

curl -X 'POST' \
  'http://localhost:8000/api/v1/jobs/inference/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "dataset": "e05ecaae-0840-48f2-afef-9417a02aacdd",
  "max_samples": -1,
  "job_config": {
    "job_type": "inference",
    "model": "gpt-4o-mini",
    "provider": "openai",
    "task": "summarization"
  }
}'

This would run an inference job with the default prompt:

python inference.py --config '{"name":"string/1f7a2caa-ddf7-4172-8002-34dfb84707f2","dataset":{"path":"s3://lumigator-storage/datasets/e05ecaae-0840-48f2-afef-9417a02aacdd/dialogsum_exc.csv"},"job":{"max_samples":-1,"storage_path":"s3://lumigator-storage/jobs/results/","output_field":"predictions"},"inference_server":{"model":"gpt-4o-mini","provider":"openai","system_prompt":"You are a helpful assistant, expert in text summarization. For every prompt you receive, provide a summary of its contents in at most two sentences.","max_retries":3},"params":{"max_tokens":1024,"frequency_penalty":0.0,"temperature":1.0,"top_p":1.0}}'

For example with translation:

curl -X 'POST' \
  'http://localhost:8000/api/v1/jobs/inference/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "name": "string",
  "dataset": "8ffa0cc3-736f-4071-8139-88cab30229ae",
  "max_samples": -1,
  "job_config": {
    "job_type": "inference",
    "model": "gpt-4o-mini",
    "provider": "openai",
    "task": "translation",
    "source_language": "en",
    "target_language": "de"
  }
}'

This would run an inference job with the default prompt with source and target languages:

python inference.py --config '{"name":"string/96906703-a273-4435-afce-73e5a8d3312c","dataset":{"path":"s3://lumigator-storage/datasets/8ffa0cc3-736f-4071-8139-88cab30229ae/sample_translation_en_de.csv"},"job":{"max_samples":-1,"storage_path":"s3://lumigator-storage/jobs/results/","output_field":"predictions"},"inference_server":{"model":"gpt-4o-mini","provider":"openai","system_prompt":"translate en to de:","max_retries":3},"params":{"max_tokens":1024,"frequency_penalty":0.0,"temperature":1.0,"top_p":1.0}}'...{'pip': '/mzai/lumigator/jobs/inference/requirements.txt', 'working_dir': '/mzai/lumigator/jobs/inference', 'env_vars': {'MZAI_JOB_ID': '96906703-a273-4435-afce-73e5a8d3312c'}}

Additional notes for reviewers

Anything you'd like to add to help the reviewer understand the changes you're proposing.

I already...

Tested the changes in a working environment to ensure they work as expected
Added some tests for any new functionality
Updated the documentation (both comments in code and product documentation under /docs)
Checked if a (backend) DB migration step was required and included it if required

…ssue-921-task-as-translation-creating-experiment

…iment

njbrake · 2025-02-21T15:40:45Z

lumigator/schemas/lumigator_schemas/tasks.py

Thinking here, I don't think I have an opinion yet, but should this code be in the jobs.py schema file instead of a new file? The structure of lumigator_schemas thus far is a schema matched to a route. A new file would be a departure from that design

Actually I started with the same logic 😄 i.e. defining TaskType within jobs.py schema but then ran into circular imports issue.

So I created the new file, but not sure if that was the best solution.

njbrake

Nice re-organization! I have a few questions mostly about where to move the logic.

njbrake · 2025-02-21T15:48:24Z

lumigator/schemas/lumigator_schemas/tasks.py

+
+
+class SummarizationValidator(TaskValidator):
+    DEFAULT_PROMPT: str = "You are a helpful assistant, expert in text summarization. For every prompt you receive, provide a summary of its contents in at most two sentences."  # noqa: E501


Idk, if we're moving this someplace, should this validation go into the inference job schema instead of lumigator_schema? Might make more sense to move this closer to the logic that is running the model.

Yes that would be closer to the logic running the model. However. it might be good to catch and throw the error early on - rather than having to wait till reaching the jobs.

I kinda agree with both here :-)
On the one hand, a job which does not validate its own inputs opens itself to possible issues.
On the other hand, our inference job is kinda agnostic of its purpose and having task-specific validation here means bringing much more logic than is needed to "just run inference".
Also, I like the principle of breaking early rather than within the job: this allows us to get higher-quality errors directly in the client via the API.
For the above reasons I'd lean more towards keeping stuff where it is now rather than with jobs. WDYT?

yep, fine with me! No strong opinion from me here :)

rather than having to wait till reaching the jobs.

the jobs schemas should be reachable from the backend. Please wait for #888
@aittalam the generate_config is implemented by each job, so the backend is kept agnostic while allowing per-job checks at the same time 👍

aittalam · 2025-02-21T15:59:39Z

lumigator/schemas/lumigator_schemas/tasks.py

+from enum import Enum
+
+
+class TaskType(str, Enum):


I think it's great that we have a schema specifically for tasks with ad-hoc validators, thank you!
WDYT about being explicit here (e.g. with a comment) saying that the task names matches the HF ones, so if a HF model is called directly we'll use the right pipeline? Just as a reference for people who might update this code later

Good idea, will add a comment.

The less translation across dataclasses, the better ^_^

aittalam · 2025-02-21T16:01:52Z

lumigator/schemas/lumigator_schemas/tasks.py

+    def set_default_prompt(self, config):
+        # We set the default prompt only if the user has not provided one
+        if config.system_prompt is None:
+            config.system_prompt = self.DEFAULT_PROMPT


What happens with HF seq2seq models which do not require a prompt?

You are right, current it isn't being used. Also this iteration won't support HF seq2seq models, but only causal LLM API models.
But in future, we will need the language pairs to be used as a "prefix" to every input sequence for seq2seq models. So in a later iteration, we could try to use construct a prefix out of it for seq2seq models and as prompt for causal models.

From https://huggingface.co/docs/transformers/en/tasks/translation:

Created an issue to track it: #976

Cool, I think as long as this does not break current s2s models doing summarization this is ok (I guess they will just be assigned a prompt which is not used by the summarization pipeline)

javiermtorres · 2025-02-21T16:34:12Z

lumigator/schemas/lumigator_schemas/experiments.py

@@ -11,15 +12,15 @@ class ExperimentCreate(BaseModel):
    description: str = ""
    dataset: UUID
    max_samples: int = -1  # set to all samples by default
-    task: str | None = "summarization"
+    task: TaskType = Field(default=TaskType.SUMMARIZATION)


If the default task type is summarization, shouldn't the default prompt also be the one for summarization?
Also, maybe we can use per-task defaults instead of a single default (we can take a look at a way to encode that in pydantic)

Yes, default prompt setting per task happens in jobs here since prompt is available at the job level and not at the experiment level.

lumigator/schemas/lumigator_schemas/jobs.py

HareeshBahuleyan added 6 commits February 19, 2025 13:57

Jobs schema validate_translation_fields

644f133

Remove usage of settings.DEFAULT_SUMMARIZER_PROMPT

e4dcf2c

Setting task specific validators and default prompts if None is provided

dba1b3b

Validator to support text-generation task

2960557

TextGenerationValidator to check if prompt is not None

6bf1941

Move text-generation prompt validator in the schema; update unit test

cc6585d

HareeshBahuleyan linked an issue Feb 19, 2025 that may be closed by this pull request

Users can specify the task as ‘translation’ when creating an experiment #921

Open

github-actions bot added backend schemas Changes to schemas (which may be public facing) labels Feb 19, 2025

HareeshBahuleyan added 8 commits February 20, 2025 13:40

Merge branch 'main' of https://github.com/mozilla-ai/lumigator into i…

b6873ae

…ssue-921-task-as-translation-creating-experiment

test_invalid_text_generation unit test to use model and provider

02120a7

Move tasks to a separate file

8124ca6

ExperimentCreate schema

f3e2824

Fix typo, TaskType enum

d584fe1

task datatype when using with MLflow

1de5e8b

Merge branch 'main' of https://github.com/mozilla-ai/lumigator into i…

f7136af

…ssue-921-task-as-translation-creating-experiment

Unit test for validating task + lang pair

5e02f88

HareeshBahuleyan changed the title ~~Create experiment and run workflow with task as translation~~ Implement task-specific validation and setting default prompts for translation/summarization if none provided Feb 21, 2025

HareeshBahuleyan changed the title ~~Implement task-specific validation and setting default prompts for translation/summarization if none provided~~ Implement task-specific validation and setting default prompts if none provided Feb 21, 2025

HareeshBahuleyan added 2 commits February 21, 2025 10:18

Sample synthetic translation dataset csv

4233c3c

Fix hardcoded system prompt

f2cbdb6

HareeshBahuleyan marked this pull request as ready for review February 21, 2025 10:19

HareeshBahuleyan requested review from javiermtorres, aittalam and khaledosman February 21, 2025 10:25

HareeshBahuleyan self-assigned this Feb 21, 2025

HareeshBahuleyan requested a review from njbrake February 21, 2025 13:11

Merge branch 'main' into issue-921-task-as-translation-creating-exper…

0e44e43

…iment

njbrake reviewed Feb 21, 2025

View reviewed changes

aittalam reviewed Feb 21, 2025

View reviewed changes

javiermtorres reviewed Feb 21, 2025

View reviewed changes

lumigator/schemas/lumigator_schemas/jobs.py Show resolved Hide resolved

HareeshBahuleyan mentioned this pull request Feb 21, 2025

Inference task is being overloaded - needs to be broken down #977

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement task-specific validation and setting default prompts if none provided #936

Implement task-specific validation and setting default prompts if none provided #936

HareeshBahuleyan commented Feb 19, 2025 •

edited

Loading

njbrake Feb 21, 2025

HareeshBahuleyan Feb 21, 2025

njbrake left a comment

njbrake Feb 21, 2025

HareeshBahuleyan Feb 21, 2025

aittalam Feb 21, 2025

njbrake Feb 21, 2025

javiermtorres Feb 21, 2025 •

edited

Loading

aittalam Feb 21, 2025

HareeshBahuleyan Feb 21, 2025

javiermtorres Feb 21, 2025

aittalam Feb 21, 2025

HareeshBahuleyan Feb 21, 2025

HareeshBahuleyan Feb 21, 2025

aittalam Feb 21, 2025 •

edited

Loading

javiermtorres Feb 21, 2025

HareeshBahuleyan Feb 21, 2025



		class SummarizationValidator(TaskValidator):
		DEFAULT_PROMPT: str = "You are a helpful assistant, expert in text summarization. For every prompt you receive, provide a summary of its contents in at most two sentences." # noqa: E501

Implement task-specific validation and setting default prompts if none provided #936

Are you sure you want to change the base?

Implement task-specific validation and setting default prompts if none provided #936

Conversation

HareeshBahuleyan commented Feb 19, 2025 • edited Loading

What's changing

How to test it

Additional notes for reviewers

I already...

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njbrake left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javiermtorres Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aittalam Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HareeshBahuleyan commented Feb 19, 2025 •

edited

Loading

javiermtorres Feb 21, 2025 •

edited

Loading

aittalam Feb 21, 2025 •

edited

Loading