Skip to content

Support flexible input sizes #883

@huningxin

Description

@huningxin

There are models that may require flexible input sizes that are determined at inference time.

Examples include some vision models that may need to work on multiple resolutions, e.g., MODNet (a model for real-time portrait matting), the model accepts input in shape of [batch_size,3,height,width].

Transformers may work on arbitrary input lengths. When using KV cache, they may also need to increase the KV cache length by 1 per each inference.

For speech recognition model, e.g., Whisper, the encoder needs to work on arbitrary input lengths, its input shape is [batch_size, feature_size, encoder_sequence_length]. The decoder (with past) needs to increase the KV cache length by 1 per each inference. i.e. the shape of input named "past_key_values.0.decoder.key" is [batch_size,6,past_decoder_sequence_length,64], the shape of corresponding output named "present.0.decoder.key" is [batch_size,6,past_decoder_sequence_length + 1,64].

For other language models, e.g., Qwen2.5-0.5B-Instruct, the model takes input in shape [batch_size,sequence_length]. Similarly, it also increases KV cache length at inference time, i.e. the shape of input "past_key_values.0.key" is [batch_size,2,past_sequence_length,64], the corresponding output "present.0.key" shape is [batch_size,2,past_sequence_length + 1,64]

Lack of the support for flexible input sizes increases the complexity of using WebNN for those models. For example, apps need to modify the model and fix the input size when compiling the WebNN graph. At the inference time, apps need to resize the image input, or pad the language input to maximum length before passing the input to WebNN graph.

Flexible input sizes are already supported by native frameworks. For example ONNX allows model inputs having dynamic dimension. At inference time, ONNX Runtime allows apps to specify arbitrary values for those dynamic dimensions. CoreML also supports flexible input shape and allows to set multiple size options or size ranges.

/cc @fdwr @xenova

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions