Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnx export RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same? #41

Closed
lucasjinreal opened this issue Jun 12, 2020 · 22 comments
Labels
bug Something isn't working Stale Stale and schedule for closing soon

Comments

@lucasjinreal
Copy link

Run onnx export got error:

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

weights trained on GPU, and converted both model and image to cuda device, why this error still happens

@lucasjinreal lucasjinreal added the bug Something isn't working label Jun 12, 2020
@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 12, 2020

@jinfagang you should be able to export to onnx like this. Run this command from the /yolov5 directory.

export PYTHONPATH="$PWD" 
python models/onnx_export.py --weights ./weights/yolov5s.pt --img 640 640 --batch 1

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 12, 2020

Output should look like this. You might want to git pull also, we've made recent changes to onnx export.

...
  %416 = Shape(%285)
  %417 = Constant[value = <Scalar Tensor []>]()
  %418 = Gather[axis = 0](%416, %417)
  %419 = Shape(%285)
  %420 = Constant[value = <Scalar Tensor []>]()
  %421 = Gather[axis = 0](%419, %420)
  %422 = Shape(%285)
  %423 = Constant[value = <Scalar Tensor []>]()
  %424 = Gather[axis = 0](%422, %423)
  %427 = Unsqueeze[axes = [0]](%418)
  %430 = Unsqueeze[axes = [0]](%421)
  %431 = Unsqueeze[axes = [0]](%424)
  %432 = Concat[axis = 0](%427, %439, %440, %430, %431)
  %433 = Reshape(%285, %432)
  %434 = Transpose[perm = [0, 1, 3, 4, 2]](%433)
  return %output, %415, %434
}
Export complete. ONNX model saved to ./weights/yolov5s.onnx
View with https://github.com/lutzroeder/netron

@lucasjinreal
Copy link
Author

lucasjinreal commented Jun 12, 2020

@glenn-jocher turns out my model is trained on GPU, and the model serialized as cuda device, so the input does not to cuda throw this error.

However, when I force it to cuda, it still got error oppsite, seems some code inside model, still using CPU tensor instead.

is there any special reason for using cpu tensor there?

@lucasjinreal
Copy link
Author

image

the generated onnx default seems enabled augmentation.

How to obtain boxes and scores and class from these outputs?

image

@glenn-jocher
Copy link
Member

@jinfagang onnx export should only be done when the model is on cpu.

The netron image you show is correct. The boxes are part of the v5 architecture, they are not related to image augmentation during training.

At the moment onnx export stops at the output features. This is an example P3 output (smallest boxes) for 3 anchors with a grid size 40x24. The 85 features are xywh, objectness, and 80 class confidences.

Screen Shot 2020-06-12 at 10 25 35 AM

@glenn-jocher
Copy link
Member

@jinfagang I ran into a cuda issue with an onnx export today, and pushed a fix 1e2cb6b for this. This may or may not solve your original issue.

@lucasjinreal
Copy link
Author

@glenn-jocher So the output is same with yolov3 in your previous repo? I wanna access the outputs and accelerate it in tensorrt.

@lucasjinreal
Copy link
Author

@glenn-jocher Does anchors decodee process can also exported into onnx? So that it can be more end2end when transfer into other paltforms for inference?

@glenn-jocher
Copy link
Member

@jinfagang yes, this would be more useful. It is more complicated to implement though, especially if you want a clean onnx graph. We will try to add this in the future.

@lucasjinreal
Copy link
Author

lucasjinreal commented Jun 15, 2020

@glenn-jocher I had a tiny experiments on this, it ends involved a ScatterND op there, this op is hard to convert to other platforms. If we want eliminate this op, postprocess scripts (Detect layer here) need re-written (only for export mode, in a more complicated way but can export and works perfectly)

@makaveli10
Copy link

@jinfagang I also ran into this issue. Resolved it by converting the model to cuda and then saving the weights. Used those weights to convert to onnx model but I ran into some issue in converting onnx to tensorRT.

If you successfully converted the model to TensorRT please let me know how you did that.
Thanks

@lucasjinreal
Copy link
Author

@makaveli10 I already converted the model to onnx and inferenced it on TensorRT.

image

However, this involved some special operations different than this repo does, and accordingly on TensorRT side needs some special operation to do. Overall, the TensorRT accelerated speed is about: 38ms with a 1280x768 input resolution, the performance is quite well:
image

you can add my wechat: jintianiloveu if you intested in this accelerate tech.

@glenn-jocher
Copy link
Member

@jinfagang great work! What is the speedup compared to using detect.py? What GPU are you using?

@lucasjinreal
Copy link
Author

lucasjinreal commented Jun 18, 2020

@glenn-jocher Am using GTX1080Ti, speed tested on this. The speed measured included post process time (from engine forward to nms and copy data back to CPU etc.). I think the speed is almost same with darknet version yolov4 converted tensorrt. (I previously tested with 800x800 input).

the speed can still be optimized by including all postprocess to cuda kernel and fp16 or int8 quantization

@kingardor
Copy link

kingardor commented Jun 18, 2020

@jinfagang amazing to see you got it running in such a short time. I'm able to convert the pth files to onnx format but I keep gettting this error when I try to convert to tensorrt6:
(Unnamed Layer* 0) [Slice]: slice is out of input range
While parsing node number 9 [Slice]:
3
If you have some pointers for me, I would really appreciate it. Connecting on WeChat is difficult for me cause I don't have an account and don't have a friend who can validate my new account.

@makaveli10
Copy link

makaveli10 commented Jun 19, 2020

@jinfagang I dont have an account on WeChat. Neither I have a friend who can verify a new account. Can you please share your code to inference onnx on TensorRT somehow? I am getting incorrect outputs from the engine that I generated using onnx model.

@kingardor
Copy link

@makaveli10 mind sharing how you were able to generate an onnx model that worked with TensorRT? Also, which version of TRT did you use?

@makaveli10
Copy link

@aj-ames https://github.com/TrojanXu/yolov5-tensorrt
Let me know if you make any sort of progress please!

@kingardor
Copy link

@makaveli10 thanks. I will update my findings here.

@yushanshan05
Copy link

@aj-ames https://github.com/TrojanXu/yolov5-tensorrt
Let me know if you make any sort of progress please!

I use this project. But I enconter the same error when I try to convert to tensorrt6:
[TensorRT] ERROR: (Unnamed Layer* 0) [Slice]: slice is out of input range
ERROR: Failed to parse the ONNX file.

If you have some pointers for me, I would really appreciate it.

@yushanshan05
Copy link

@glenn-jocher Am using GTX1080Ti, speed tested on this. The speed measured included post process time (from engine forward to nms and copy data back to CPU etc.). I think the speed is almost same with darknet version yolov4 converted tensorrt. (I previously tested with 800x800 input).

the speed can still be optimized by including all postprocess to cuda kernel and fp16 or int8 quantization

When I convert onnx to tensorrt, I enconter the same error when I try to convert to tensorrt6:
[TensorRT] ERROR: (Unnamed Layer* 0) [Slice]: slice is out of input range
ERROR: Failed to parse the ONNX file.

I use tensorrt 6.0 and onnx 1.5.0 or 1.6.0, they are all not work.
If you have some pointers for me, I would really appreciate it.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Aug 1, 2020
YoungjaeDev pushed a commit to avikus-ai/detect_train that referenced this issue Feb 10, 2023
K-tang-mkv pushed a commit to K-tang-mkv/yolov5 that referenced this issue Jun 9, 2023
…_inference_fix

onnx inference visualisation fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale Stale and schedule for closing soon
Projects
None yet
Development

No branches or pull requests

5 participants