You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up on #8657, which added support for passing initialization time mm_processor_kwargs to be used by the input mapper / processor / max token count calculations / dummy data if they're added to architecture-specific implementations as keyword arguments. It would be nice to also be to pass such kwargs as input values at inference time as part of the multi-modal data, e.g.,:
Such that for models that support additional mm_processor_kwargs:
The initialization time mm_processor_kwargs take priority over the config values
The inference time mm_processor_kwargs take priority over the config values and the initialization mm_processor_kwargs
Alternatives
Keep mm_processor_kwargs as initialization time only
Additional context
For per-request mm_processor_kwargs, it needs to be correctly handled:
In the input mapper
In the input processor
Some care needs to be taken around the input mapper, which falls back to a wrapper around HF resources, e.g., image processors, since it may take stuff out of the config. More specifically:
We should avoid initializing and managing multiple multimodal processors with different processor kwargs if possible
Init time processor kwargs / per request processor kwargs should behave identically - this probably depends on the preprocess signature for the HF resource closely matching the init signature by default
If for whatever reason init/preprocess are not well-aligned, the mapper / processor can be implemented in the VLLM model class as a backup plan to fix it
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
🚀 The feature, motivation and pitch
Follow-up on #8657, which added support for passing initialization time
mm_processor_kwargs
to be used by the input mapper / processor / max token count calculations / dummy data if they're added to architecture-specific implementations as keyword arguments. It would be nice to also be to pass such kwargs as input values at inference time as part of the multi-modal data, e.g.,:Such that for models that support additional
mm_processor_kwargs
:mm_processor_kwargs
take priority over the config valuesmm_processor_kwargs
take priority over the config values and the initializationmm_processor_kwargs
Alternatives
Keep
mm_processor_kwargs
as initialization time onlyAdditional context
For per-request
mm_processor_kwargs
, it needs to be correctly handled:Some care needs to be taken around the input mapper, which falls back to a wrapper around HF resources, e.g., image processors, since it may take stuff out of the config. More specifically:
preprocess
signature for the HF resource closely matching theinit
signature by defaultBefore submitting a new issue...
The text was updated successfully, but these errors were encountered: