-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards a clearer logic for determining output height and width #332
Conversation
int height = 0, width = 0; | ||
std::tie(height, width) = getHeightAndWidthFromOptionsOrAVFrame( | ||
streams_[streamIndex].options, filteredFrame.get()); | ||
std::vector<int64_t> shape = {height, width, 3}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the only place where the logic is slightly changed (but I think the behavior is the same): we go through options
first. I think this is correct, because the filteredFrame
should respect options
itself?
In any case, this is what is done for the swscale case as well. If that's incorrect or a change of behavior, LMK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct to me, but @ahmadsharif1 should also reason through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TORCH_CHECK to make sure the filteredFrame has the expected dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the validity checks as follow up!
@@ -209,8 +197,9 @@ void convertAVFrameToDecodedOutputOnCuda( | |||
src->format == AV_PIX_FMT_CUDA, | |||
"Expected format to be AV_PIX_FMT_CUDA, got " + | |||
std::string(av_get_pix_fmt_name((AVPixelFormat)src->format))); | |||
int width = options.width.value_or(codecContext->width); | |||
int height = options.height.value_or(codecContext->height); | |||
int height = 0, width = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please declare only one variable per line.
durationSeconds(torch::empty({numFrames}, {torch::kFloat64})) {} | ||
: ptsSeconds(torch::empty({numFrames}, {torch::kFloat64})), | ||
durationSeconds(torch::empty({numFrames}, {torch::kFloat64})) { | ||
int height = 0, width = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same nit, and there's a few other places.
options.height.value_or(avFrame->height), | ||
options.width.value_or(avFrame->width)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually prefer that we just call both of these functions getHeightAndWidth
and allow C++ function overloading to determine which one to call. In C++, because the types of parameters are formally a part of the function signature, it's less common to encode the type of one of the parameters in the name. This, however, is a question of style, and I know @ahmadsharif1 may feel differently.
Read the comment below, I get that you're purposefully marking in the code which strategy we're doing where.
auto tensorOptions = torch::TensorOptions() | ||
.dtype(torch::kUInt8) | ||
.layout(torch::kStrided) | ||
.device(device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical or blocking, but this may be a good time to put in TORCH_CHECK
s to make sure the height and width are not negative. I just looked it up, and I think the tensor API can handle that, but that's probably not going to be what we expect. (I know you plan on a follow-up PR with more error-checking, so this sort of thing may be more appropriate there.)
Almost forgot to mention: great refactoring, and thanks for spending the time to reason through all of this and make sure what we're doing is sensible! |
int width = options.width.value_or(codecContext->width); | ||
int height = options.height.value_or(codecContext->height); | ||
int height = 0, width = 0; | ||
std::tie(height, width) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is less maintainable/safe than returning a struct with named members.
Someone could write:
std::tie(width, height)
instead of std::tie(height, width)
and it would be hard to notice.
|
||
std::tuple<int, int> getHeightAndWidthFromOptionsOrAVFrame( | ||
const VideoDecoder::VideoStreamDecoderOptions& options, | ||
AVFrame* avFrame); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be const too
const & would be preferred since this is a mandatory parameter and should never be null
int height = 0, width = 0; | ||
std::tie(height, width) = getHeightAndWidthFromOptionsOrAVFrame( | ||
streams_[streamIndex].options, filteredFrame.get()); | ||
std::vector<int64_t> shape = {height, width, 3}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TORCH_CHECK to make sure the filteredFrame has the expected dimensions?
Background from #269:
We create/allocate output frame tensors in different places. In particular, we determine the
height
andwidth
of the output tensor from different sources:CodecContext
AVFrame
CodecContext
The info from the metadata /
CodecContext
are available as soon as we add a stream, e.g. right after we instantiate a PythonVideoDecoder
. TheAVFrame
is only available once we have decoded the frame with ffmpeg (this is the "raw output").The source of truth really is the
AVFrame
.CondecContext
may be wrong, and in particular we now know that some streams may have variable height and width #312.What this PR does
height
andwidth
, by introducinggetHeightAndWidthFromOptionsOrMetadata()
andgetHeightAndWidthFromOptionsOrAVFrame()
.allocateEmptyHWCTensor()
.What this PR does not