Modeling code and nearby utilities should work with "raw" bytes not URLs #244

ashwinb · 2024-12-16T00:43:23Z

What this PR does

This is a long-pending change and particularly important to get done now.

Specifically:

we cannot "localize" (aka download) any URLs from media attachments anywhere near our modeling code. it must be done upwards in the stack or in other utilities
the PIL.Image is infesting all our APIs and that cannot be right at all. we need a standard { type: "image", image_url: "<...>" } which is more extensible
this essentially argues for separating Model-related Message types (don't have too much structure here) and the API-level Message types. As a result of this PR, UserMessage, etc. are moved completely to llama-stack.

This PR will have a substantial accompanying PR in llama-stack as well.

Test Plan

Ran the sole pytest test for model running:

TEXT_MODEL_CHECKPOINT_DIR=~/.llama/checkpoints/Llama3.2-3B-Instruct \
  PYTHONPATH=. \
  pytest models/llama3/tests/api/test_generation.py

Ran the example scripts:

PYTHONPATH=. \
   torchrun \
   llama_models/scripts/multimodal_example_chat_completion.py ~/.llama/checkpoints/Llama3.2-11B-Vision-Instruct

Modeling code or code close to it (chat_format.py specifically) should not be thinking of downloading URLs, etc. Especially not doing it randomly on-demand.

ashwinb · 2024-12-16T00:46:50Z

models/llama3/api/datatypes.py

-    content: InterleavedTextMedia
-    stop_reason: StopReason
-    tool_calls: List[ToolCall] = Field(default_factory=list)
+class RawMessage(BaseModel):


This RawMessage and RawContent types are the most important bits in this PR. Everything is downstream of these changes. See RawContentItem as well above.

…rmat

## What does this PR do? This is a long-pending change and particularly important to get done now. Specifically: - we cannot "localize" (aka download) any URLs from media attachments anywhere near our modeling code. it must be done within llama-stack. - `PIL.Image` is infesting all our APIs via `ImageMedia -> InterleavedTextMedia` and that cannot be right at all. Anything in the API surface must be "naturally serializable". We need a standard `{ type: "image", image_url: "<...>" }` which is more extensible - `UserMessage`, `SystemMessage`, etc. are moved completely to llama-stack from the llama-models repository. See meta-llama/llama-models#244 for the corresponding PR in llama-models. ## Test Plan ```bash cd llama_stack/providers/tests pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py pytest -s -v -k chroma memory/test_memory.py \ --env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar pytest -s -v -k fireworks agents/test_agents.py \ --safety-shield=meta-llama/Llama-Guard-3-8B \ --inference-model=meta-llama/Llama-3.1-8B-Instruct ``` Updated the client sdk (see PR ...), installed the SDK in the same environment and then ran the SDK tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py # this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py ```

ashwinb added 3 commits December 15, 2024 10:45

Model should work with "raw" bytes, never URLs

74dcbc3

Modeling code or code close to it (chat_format.py specifically) should not be thinking of downloading URLs, etc. Especially not doing it randomly on-demand.

Use ModelInputMessage / ModelOutputMessage and BytesIO

8f0b79f

Fixes

4af7293

ashwinb requested review from yanxi0830, hardikjshah, dltn and raghotham as code owners December 16, 2024 00:43

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 16, 2024

ashwinb commented Dec 16, 2024

View reviewed changes

ashwinb mentioned this pull request Dec 16, 2024

Update the "InterleavedTextMedia" type meta-llama/llama-stack#635

Merged

Fold everything into a much simpler RawMessage type, update prompt_fo…

a4c98d4

…rmat

ashwinb force-pushed the no_urls_in_models branch from 12d6b8e to a4c98d4 Compare December 16, 2024 02:44

yanxi0830 approved these changes Dec 17, 2024

View reviewed changes

ashwinb merged commit bf5b0c4 into main Dec 17, 2024
1 check passed

ashwinb deleted the no_urls_in_models branch December 17, 2024 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modeling code and nearby utilities should work with "raw" bytes not URLs #244

Modeling code and nearby utilities should work with "raw" bytes not URLs #244

ashwinb commented Dec 16, 2024 •

edited

Loading

ashwinb Dec 16, 2024

Modeling code and nearby utilities should work with "raw" bytes not URLs #244

Modeling code and nearby utilities should work with "raw" bytes not URLs #244

Conversation

ashwinb commented Dec 16, 2024 • edited Loading

What this PR does

Test Plan

ashwinb Dec 16, 2024

Choose a reason for hiding this comment

ashwinb commented Dec 16, 2024 •

edited

Loading