Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Inference Overrides for mm_processor_kwargs #8742

Closed
1 of 3 tasks
alex-jw-brooks opened this issue Sep 23, 2024 · 1 comment · Fixed by #9131
Closed
1 of 3 tasks

[Feature]: Support Inference Overrides for mm_processor_kwargs #8742

alex-jw-brooks opened this issue Sep 23, 2024 · 1 comment · Fixed by #9131

Comments

@alex-jw-brooks
Copy link
Contributor

alex-jw-brooks commented Sep 23, 2024

🚀 The feature, motivation and pitch

Follow-up on #8657, which added support for passing initialization time mm_processor_kwargs to be used by the input mapper / processor / max token count calculations / dummy data if they're added to architecture-specific implementations as keyword arguments. It would be nice to also be to pass such kwargs as input values at inference time as part of the multi-modal data, e.g.,:

llm.generate({"multi_modal_data": {"image": {"data": image, "mm_processor_kwargs": image_kwargs}}})

Such that for models that support additional mm_processor_kwargs:

  • The initialization time mm_processor_kwargs take priority over the config values
  • The inference time mm_processor_kwargs take priority over the config values and the initialization mm_processor_kwargs

Alternatives

Keep mm_processor_kwargs as initialization time only

Additional context

For per-request mm_processor_kwargs, it needs to be correctly handled:

  • In the input mapper
  • In the input processor

Some care needs to be taken around the input mapper, which falls back to a wrapper around HF resources, e.g., image processors, since it may take stuff out of the config. More specifically:

  • We should avoid initializing and managing multiple multimodal processors with different processor kwargs if possible
  • Init time processor kwargs / per request processor kwargs should behave identically - this probably depends on the preprocess signature for the HF resource closely matching the init signature by default
    • If for whatever reason init/preprocess are not well-aligned, the mapper / processor can be implemented in the VLLM model class as a backup plan to fix it

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@alex-jw-brooks
Copy link
Contributor Author

alex-jw-brooks commented Sep 23, 2024

I think this should be straightforward to implement - I plan to try it in the next week or so 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant