vision model's input size spedified with cmd line is overrided by pretrained model config #2035

waterdropw · 2024-09-29T02:42:55Z

optimum-cli export onnx --no-dynamic-axe --batch_size 1 --sequence_length 16 --width 224 --height 224  --num_channels 3  --model google/owlv2-base-patch16-ensemble owlv2-base-patch16-ensemble-onnx

the exported onnx model input size is still 960x960, and I found the dummy input generator will use the pretrained model config in normalized_config 960 instead, but not the cmd line specified 224:

optimum/optimum/utils/input_generators.py

Line 762 in f7c3a7f

    
           # Some vision models can take any input sizes, in this case we use the values provided as parameters.

is it a bug?

The text was updated successfully, but these errors were encountered:

waterdropw · 2024-09-29T02:45:09Z

If not, how could I export onnx model with 224x224 or other size which is different from the pretrained 960x960?

IlyasMoutawwakil · 2024-10-11T08:59:38Z

if I understand correctly, you want the model to not use dynamic axes and statically exported to 224x224 ?

IlyasMoutawwakil · 2024-10-11T09:17:18Z

I guess I see what's happening here, so --no-dynamic-axes feature was added to allow users to export static models, and the input shapes were added to to allow users to pass input shapes when it's impossible to infer them from config, and not to force a shape.

I think this is not a bug, since the intention was to fix specific edge cases, but yes it would make sense to support the feature you're requesting here.

All generators will have to be updated with something from:

    def __init__(
        self,
        task: str,
        normalized_config: NormalizedVisionConfig,
        batch_size: int = DEFAULT_DUMMY_SHAPES["batch_size"],
        num_channels: int = DEFAULT_DUMMY_SHAPES["num_channels"],
        width: int = DEFAULT_DUMMY_SHAPES["width"],
        height: int = DEFAULT_DUMMY_SHAPES["height"],
        **kwargs,
    ):
        self.task = task

        # Some vision models can take any input sizes, in this case we use the values provided as parameters.
        if normalized_config.has_attribute("num_channels"):
            self.num_channels = normalized_config.num_channels
        else:
            self.num_channels = num_channels

to

    def __init__(
        self,
        task: str,
        normalized_config: NormalizedVisionConfig,
        **input_shapes,
    ):
        self.task = task

        if kwargs.get("num_channels", None) is not None:
            self.num_channels = kwargs.pop("num_channels")
        elif normalized_config.has_attribute("num_channels"):
            self.num_channels = normalized_config.num_channels
        else:
            self.num_channels = DEFAULT_DUMMY_SHAPES.get("num_channels")

where user input shapes take precedence over normalized config.

@echarlaix wdyt, since static export is probably something OpenVINO models offer

waterdropw · 2024-10-14T02:04:18Z

if I understand correctly, you want the model to not use dynamic axes and statically exported to 224x224 ?

@IlyasMoutawwakil Yes, I want to deploy the model on an edge/terminal device like a phone or IoT device.
It needs to be exported as a static graph, and more important is smaller input size, which is the performance bottleneck of a VLM, because the VisionEmbeddings has a big kernel conv(ex. 16x16) and too more tokens(pow(960/224, 2)) into attention calculation with big input size.

waterdropw · 2024-10-14T02:22:30Z

I guess I see what's happening here, so --no-dynamic-axes feature was added to allow users to export static models, and the input shapes were added to to allow users to pass input shapes when it's impossible to infer them from config, and not to force a shape.

I think this is not a bug, since the intention was to fix specific edge cases, but yes it would make sense to support the feature you're requesting here.

All generators will have to be updated with something from:

    def __init__(
        self,
        task: str,
        normalized_config: NormalizedVisionConfig,
        batch_size: int = DEFAULT_DUMMY_SHAPES["batch_size"],
        num_channels: int = DEFAULT_DUMMY_SHAPES["num_channels"],
        width: int = DEFAULT_DUMMY_SHAPES["width"],
        height: int = DEFAULT_DUMMY_SHAPES["height"],
        **kwargs,
    ):
        self.task = task

        # Some vision models can take any input sizes, in this case we use the values provided as parameters.
        if normalized_config.has_attribute("num_channels"):
            self.num_channels = normalized_config.num_channels
        else:
            self.num_channels = num_channels

to

    def __init__(
        self,
        task: str,
        normalized_config: NormalizedVisionConfig,
        **input_shapes,
    ):
        self.task = task

        if kwargs.get("num_channels", None) is not None:
            self.num_channels = kwargs.pop("num_channels")
        elif normalized_config.has_attribute("num_channels"):
            self.num_channels = normalized_config.num_channels
        else:
            self.num_channels = DEFAULT_DUMMY_SHAPES.get("num_channels")

where user input shapes take precedence over normalized config.

@echarlaix wdyt, since static export is probably something OpenVINO models offer

I have dived deep into this issue and found that position_embedding need to be interpolated for any other input sizes except the pretrained, and I test the precision is acceptable for deployment.
Maybe I will push a PR for optimum to support this feature.

IlyasMoutawwakil · 2024-10-14T12:26:04Z

@waterdropw I would love to review a PR 🤗

This comment was marked as off-topic.

Sign in to view

dacorvo added the onnx Related to the ONNX export label Oct 8, 2024

IlyasMoutawwakil added the exporters Issue related to exporters label Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vision model's input size spedified with cmd line is overrided by pretrained model config #2035

vision model's input size spedified with cmd line is overrided by pretrained model config #2035

waterdropw commented Sep 29, 2024

waterdropw commented Sep 29, 2024 •

edited

Loading

This comment was marked as off-topic.

IlyasMoutawwakil commented Oct 11, 2024

IlyasMoutawwakil commented Oct 11, 2024 •

edited

Loading

waterdropw commented Oct 14, 2024

waterdropw commented Oct 14, 2024

IlyasMoutawwakil commented Oct 14, 2024

vision model's input size spedified with cmd line is overrided by pretrained model config #2035

vision model's input size spedified with cmd line is overrided by pretrained model config #2035

Comments

waterdropw commented Sep 29, 2024

waterdropw commented Sep 29, 2024 • edited Loading

This comment was marked as off-topic.

IlyasMoutawwakil commented Oct 11, 2024

IlyasMoutawwakil commented Oct 11, 2024 • edited Loading

waterdropw commented Oct 14, 2024

waterdropw commented Oct 14, 2024

IlyasMoutawwakil commented Oct 14, 2024

waterdropw commented Sep 29, 2024 •

edited

Loading

IlyasMoutawwakil commented Oct 11, 2024 •

edited

Loading