Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative group attribute in generated convolution operations #2084

Closed
javidcf opened this issue Nov 18, 2022 · 1 comment · Fixed by #2090
Closed

Negative group attribute in generated convolution operations #2084

javidcf opened this issue Nov 18, 2022 · 1 comment · Fixed by #2090
Labels
bug An unexpected problem or unintended behavior contribution welcome Community contribution is welcomed

Comments

@javidcf
Copy link
Contributor

javidcf commented Nov 18, 2022

Describe the bug
Generated ONNX convolution nodes may get a negative group attribute value in certain cases. The graph is generated without errors, but trying to run it causes an error in ONNXRuntime.

Urgency
NA

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04*): Windows 10
  • TensorFlow Version: 2.4.1
  • Python version: 3.7
  • ONNX version (if applicable, e.g. 1.11*): 1.10.2
  • ONNXRuntime version (if applicable, e.g. 1.11*): 1.9.0

To Reproduce

import tensorflow as tf
import tf2onnx
import numpy as np
import onnxruntime

# Convolution function
@tf.function
def conv1d(x, y):
    return tf.nn.conv1d(x, y, [1], 'VALID', name='Conv')
# Convert to ONNX model
model_proto, _ = tf2onnx.convert.from_function(
    conv1d,
    # The number of input channels in the filters is left unspecified
    input_signature=[tf.TensorSpec([None, None, 3]), tf.TensorSpec([10, None, 5])])
# Find convolution node
conv_node = next(n for n in model_proto.graph.node if n.name == 'Conv')
# Find group attribute
group_attr = next(a for a in conv_node.attribute if a.name == 'group')
# Attribute has a negative value
print(group_attr)
# name: "group"
# i: -3
# type: INT

# Try to use with ONNX runtime
sess = onnxruntime.InferenceSession(model_proto.SerializeToString())
res = sess.run([sess.get_outputs()[0].name], {'x': np.zeros([10, 100, 3], np.float32), 'y': np.zeros([10, 3, 5], np.float32)})
# 2022-11-18 16:40:50.3568383 [E:onnxruntime:, sequential_executor.cc:346 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Conv node. Name:'Conv' Status Message: Input channels C is not equal to kernel channels * group. C: 3 kernel channels: 3 group: -3
# ---------------------------------------------------------------------------
# Fail                                      Traceback (most recent call last)
# <ipython-input-268-ffc10888f6c3> in <module>
#      25 # Try to use with ONNX runtime
#      26 sess = onnxruntime.InferenceSession(model_proto.SerializeToString())
# ---> 27 res = sess.run([sess.get_outputs()[0].name], {'x': np.zeros([10, 100, 3], np.float32), 'y': np.zeros([10, 3, 5], np.float32)})
# 
# ~\Anaconda3\envs\tf\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py in run(self, output_names, input_feed, run_options)
#     186             output_names = [output.name for output in self._outputs_meta]
#     187         try:
# --> 188             return self._sess.run(output_names, input_feed, run_options)
#     189         except C.EPFail as err:
#     190             if self._enable_fallback:
# 
# Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'Conv' Status Message: Input channels C is not equal to kernel channels * group. C: 3 kernel channels: 3 group: -3

Screenshots
NA

Additional context
The error seems to be in tf2onnx.onnx_opset.nn. The calculation of the groups attribute is done as follows:

shape_dim = -1
if data_format == "NHWC":
    shape_dim = ctx.get_shape(node.input[0])[3]
elif data_format == "NCHW":
    shape_dim = ctx.get_shape(node.input[0])[1]
if shape_dim != -1:
    groups = int(shape_dim / ctx.get_shape(node.input[1])[2])

First the number of channels in node.input[0] is checked, depending on data_format, and then, only if that does not result in -1, it is divided by the number of input channels in node.input[1]. However, there is no check to verify that number of channels is actually valid, and not -1. I suppose the simple fix would be to just check that and leave the attribute as 1 if either of the two values is -1, but I thought I'd leave that decision to someone from the team instead of directly sending a PR.

@javidcf javidcf added the bug An unexpected problem or unintended behavior label Nov 18, 2022
@fatcat-z
Copy link
Collaborator

Thanks for your detailed comments. Please feel free to submit a PR for such fix and we can help to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior contribution welcome Community contribution is welcomed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants