Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop operator with no initial values for loop-carried dependencies aborts #543

Closed
mneilly opened this issue Oct 15, 2020 · 5 comments
Closed
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@mneilly
Copy link

mneilly commented Oct 15, 2020

The attached DLRM model uses ONNX loop operators which have only the trip count and condition inputs but no initial values for the loop-carried dependencies. This results in onnx2trt aborting as follows when it hits an empty std::vector:

$ onnx2trt -o dlrm_s_pytorch.trt dlrm_s_pytorch.onnx 
----------------------------------------------------------------
Input filename:   dlrm_s_pytorch.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.8
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
Parsing model
[2020-10-15 05:44:40 WARNING] [TRT]/local/tensorRT/onnx-tensorrt/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2020-10-15 05:44:40 WARNING] [TRT]/local/tensorRT/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)
Aborted (core dumped)

This occurs when there are no initial values and an empty stateVars vector is accessed at line 1730 below:

// Add initial state inputs using recurrent layers.
std::vector<nvinfer1::IRecurrenceLayer*> stateVars{};
for (size_t i = 2; i < inputs.size(); ++i)
{
stateVars.emplace_back(loop->addRecurrence(convertToTensor(inputs[i], ctx)));
ctx->loopTensors()[body.input(i).name()] = node.input(i);
ctx->registerTensor(TensorOrWeights{stateVars.back()->getOutput(0)}, body.input(i).name());
}
ctx->registerLayer(stateVars.at(0), node.name());

I can't upload the version with embedded tensors since it is too large so I'm attaching the version with external tensors. #542 needs to be resolved or worked around to hit this issue when using external tensors.

dlrm-external-tensors.zip

@kevinch-nv kevinch-nv added bug Something isn't working triaged Issue has been triaged by maintainers labels Jan 8, 2021
@kevinch-nv kevinch-nv self-assigned this Jan 8, 2021
@kevinch-nv
Copy link
Collaborator

Thanks for the report. I'll make a MR fixing this along with #542

@kevinch-nv
Copy link
Collaborator

Merged the fix - can you confirm that the fix works for you?

@mneilly
Copy link
Author

mneilly commented Jan 12, 2021

I'll check it out. :)

@kevinch-nv
Copy link
Collaborator

@mneilly Did the fix work for you? Can this be closed?

@kevinch-nv
Copy link
Collaborator

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants