Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The speed of example in the release (pre) is much slower than the previous method. #22

Closed
hooke007 opened this issue Jan 18, 2023 · 8 comments

Comments

@hooke007
Copy link
Contributor

perform the necessary padding, YUV/RGB conversion and FP16 conversion in one go

https://github.com/AmusementClub/vs-mlrt/releases/tag/v13

In my device, "pad first and then crop" (283fps) is faster than "all-in-one" processing (243fps).

1080p_p10

2160p_p10

@WolframRhodium
Copy link
Contributor

Thanks, probably zimg does not efficiently fuse operations.

@hooke007 hooke007 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2023
@hooke007
Copy link
Contributor Author

@WolframRhodium Hi, i was testing rife v2(bd0ff98) but failed. I'm not sure if I missed sth.

trtexec_230227_151717.log

@WolframRhodium
Copy link
Contributor

Does it work if dynamic shapes or faster_dynamic_shapes is disabled? I only have tested the former.

@hooke007
Copy link
Contributor Author

hooke007 commented Feb 28, 2023

faster_dynamic_shapes is disabled

Failed.

@hooke007
Copy link
Contributor Author

hooke007 commented Feb 28, 2023

Though I cannot make it work with dynamic shape, but v2 (Output 16000 frames in 53.22 seconds (300.66 fps)) is much faster than v1 (Output 16000 frames in 68.05 seconds (235.12 fps)).
Tested 1080p10bit

@WolframRhodium
Copy link
Contributor

WolframRhodium commented Feb 28, 2023

Hi, it seems that the Range operator does not support fp16 precision, and removing --layerPrecisions=*:fp16 --layerOutputTypes=*:fp16 --precisionConstraints=obey in your command will work. You may have to manually exclude those layers in precision specification.

edit: replace --precisionConstraints=obey by --precisionConstraints=prefer also works, with the following warnings

[02/28/2023-10:40:14] [W] [TRT] No implementation of layer /Range obeys the requested constraints. I.e. no conforming implementation was found for requested layer computation precision and output precision. Using fastest implementation instead.
[02/28/2023-10:40:14] [W] [TRT] No implementation of layer /Range_1 obeys the requested constraints. I.e. no conforming implementation was found for requested layer computation precision and output precision. Using fastest implementation instead.
[02/28/2023-10:40:14] [W] [TRT] No implementation of layer [ShapeHostToDeviceCopy 0] obeys the requested constraints. I.e. no conforming implementation was found for requested layer computation precision and output precision. Using fastest implementation instead.
[02/28/2023-10:40:28] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: [] Mismatched type for tensor (Unnamed Layer_ 83) [Constant]_output', f16 vs. expected type:f32.
[02/28/2023-10:41:58] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: [] Mismatched type for tensor (Unnamed Layer_ 706) [Shuffle]_output', f16 vs. expected type:f32.

the first three warnings (and related source code) confirms the observation, and the last two warnings may be a bug in trt's experimental graph compiler

@hooke007
Copy link
Contributor Author

@WolframRhodium Hello, I have a question about v2 model.

For V1 model, I use 1920x 1088 as the value of opt_shape. Whether should I use the 1920x 1080 instead for V2?

@WolframRhodium
Copy link
Contributor

I think 1920x1080 should be used for V2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants