Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend decoding support #4480

Merged
merged 29 commits into from
Dec 13, 2022
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
7a93080
Extend decoding support
awolant Nov 29, 2022
f0428b5
Add checking of the flag
awolant Nov 30, 2022
7292b0e
Add working parsing for all files
awolant Nov 30, 2022
23c5092
Merge all file support
awolant Nov 30, 2022
a79cda5
Port changes to GPU decoder
awolant Nov 30, 2022
2a827d3
Working for all files
awolant Nov 30, 2022
8c5e2d2
Add support for all files
awolant Nov 30, 2022
29261c4
Remove test
awolant Nov 30, 2022
5595867
Sort of working version
awolant Dec 5, 2022
eecbb09
Remove changes to decoding
awolant Dec 5, 2022
6ca3745
Fix tests
awolant Dec 6, 2022
3ee2991
Working for all files
awolant Dec 6, 2022
4a8b8d9
Fixes and tests
awolant Dec 6, 2022
f07ef9d
Adjust tests
awolant Dec 6, 2022
87153e9
Fix linter
awolant Dec 6, 2022
e0b3e69
Merge remote-tracking branch 'nvidia/main' into extend_decoding_support
awolant Dec 7, 2022
7f6db46
Fix linter
awolant Dec 7, 2022
666c2b9
Fix werror build
awolant Dec 7, 2022
5ffaf13
Merge remote-tracking branch 'nvidia/main' into extend_decoding_support
awolant Dec 8, 2022
093d0d1
Update DALI_extra version
awolant Dec 8, 2022
2939e22
Update to force zero latency
awolant Dec 9, 2022
eccc2db
Update raw stream tests
awolant Dec 9, 2022
624f3c0
Update DALI_extra version
awolant Dec 9, 2022
4690624
Update DALI_EXTRA_VERSION
awolant Dec 9, 2022
42dffe6
Fix typo
awolant Dec 12, 2022
77b7248
Merge remote-tracking branch 'nvidia/main' into extend_decoding_support
awolant Dec 12, 2022
f92e8d5
Merge remote-tracking branch 'nvidia/main' into extend_decoding_support
awolant Dec 12, 2022
2b24562
Review fixes
awolant Dec 12, 2022
b732928
Review fixes
awolant Dec 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DALI_EXTRA_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
69ffed23b15233b583cb5a398e860a63863b2c99
Update after merging https://github.com/NVIDIA/DALI_extra/pull/116
106 changes: 95 additions & 11 deletions dali/operators/reader/loader/video/frames_decoder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,9 @@ bool FramesDecoder::CheckCodecSupport() {

void FramesDecoder::FindVideoStream(bool init_codecs) {
if (init_codecs) {
for (size_t i = 0; i < av_state_->ctx_->nb_streams; ++i) {
size_t i = 0;

for (i = 0; i < av_state_->ctx_->nb_streams; ++i) {
av_state_->codec_params_ = av_state_->ctx_->streams[i]->codecpar;
av_state_->codec_ = avcodec_find_decoder(av_state_->codec_params_->codec_id);

Expand All @@ -139,11 +141,12 @@ void FramesDecoder::FindVideoStream(bool init_codecs) {

if (av_state_->codec_->type == AVMEDIA_TYPE_VIDEO) {
av_state_->stream_id_ = i;
return;
break;
}
}

DALI_FAIL(make_string("Could not find a valid video stream in a file ", Filename()));
DALI_ENFORCE(i < av_state_->ctx_->nb_streams,
make_string("Could not find a valid video stream in a file ", Filename()));
} else {
av_state_->stream_id_ = av_find_best_stream(av_state_->ctx_, AVMEDIA_TYPE_VIDEO,
-1, -1, nullptr, 0);
Expand All @@ -154,6 +157,10 @@ void FramesDecoder::FindVideoStream(bool init_codecs) {

av_state_->codec_params_ = av_state_->ctx_->streams[av_state_->stream_id_]->codecpar;
}
if (Height() == 0 || Width() == 0) {
DALI_ENFORCE(avformat_find_stream_info(av_state_->ctx_, nullptr) >= 0);
DALI_ENFORCE(Height() != 0 && Width() != 0, "Couldn't load video size info.");
}
}

FramesDecoder::FramesDecoder(const std::string &filename)
Expand Down Expand Up @@ -236,20 +243,97 @@ FramesDecoder::FramesDecoder(const char *memory_file, int memory_file_size, bool
DetectVfr();
}

void FramesDecoder::CreateAvState(std::unique_ptr<AvState> &av_state, bool init_codecs) {
JanuszL marked this conversation as resolved.
Show resolved Hide resolved
av_state->ctx_ = avformat_alloc_context();
DALI_ENFORCE(av_state_->ctx_, "Could not alloc avformat context");

uint8_t *av_io_buffer = static_cast<uint8_t *>(av_malloc(default_av_buffer_size));

AVIOContext *av_io_context = avio_alloc_context(
av_io_buffer,
default_av_buffer_size,
0,
&memory_video_file_.value(),
detail::read_memory_video_file,
nullptr,
detail::seek_memory_video_file);

av_state->ctx_->pb = av_io_context;

int ret = avformat_open_input(&av_state->ctx_, "", nullptr, nullptr);
DALI_ENFORCE(
ret == 0,
make_string(
"Failed to open video file ",
Filename(),
"due to ",
detail::av_error_string(ret)));
av_state->stream_id_ = av_find_best_stream(
av_state->ctx_, AVMEDIA_TYPE_VIDEO, -1, -1, nullptr, 0);
av_state->codec_params_ = av_state->ctx_->streams[av_state->stream_id_]->codecpar;

av_state->codec_ctx_ = avcodec_alloc_context3(av_state->codec_);
DALI_ENFORCE(av_state->codec_ctx_, "Could not alloc av codec context");

ret = avcodec_parameters_to_context(av_state->codec_ctx_, av_state->codec_params_);
DALI_ENFORCE(
ret >= 0,
make_string("Could not fill the codec based on parameters: ", detail::av_error_string(ret)));

av_state->packet_ = av_packet_alloc();
DALI_ENFORCE(av_state->packet_, "Could not allocate av packet");
Comment on lines +275 to +284
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can replace it with InitAvState().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateAvState covers some parts of the InitAvState and some parts of other functions. I would like to decouple the av state manipulation from this code altogether, so I will refactor it during that, if that is fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

void FramesDecoder::ParseNumFrames() {
int curr_num_frames = 0;
while (av_read_frame(av_state_->ctx_, av_state_->packet_) >= 0) {
// We want to make sure that we call av_packet_unref in every iteration
auto packet = AVPacketScope(av_state_->packet_, av_packet_unref);

if (packet->stream_index != av_state_->stream_id_) {
continue;
if (IsFormatSeekable()) {
while (av_read_frame(av_state_->ctx_, av_state_->packet_) >= 0) {
// We want to make sure that we call av_packet_unref in every iteration
auto packet = AVPacketScope(av_state_->packet_, av_packet_unref);

if (packet->stream_index != av_state_->stream_id_) {
continue;
}
curr_num_frames++;
}

num_frames_ = curr_num_frames;
Reset();
} else {
// Failover for unseekable video
auto current_position = memory_video_file_->position_;
memory_video_file_->Seek(0, SEEK_SET);
JanuszL marked this conversation as resolved.
Show resolved Hide resolved
std::unique_ptr<AvState> tmp_av_state = std::make_unique<AvState>();
CreateAvState(tmp_av_state, false);

while (av_read_frame(tmp_av_state->ctx_, tmp_av_state->packet_) >= 0) {
// We want to make sure that we call av_packet_unref in every iteration
Copy link
Contributor

@JanuszL JanuszL Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can extract this to a function as the same pattern in applied in L291.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, called CountFrames.

auto packet = AVPacketScope(tmp_av_state->packet_, av_packet_unref);

if (packet->stream_index != tmp_av_state->stream_id_) {
continue;
}
curr_num_frames++;
}

num_frames_ = curr_num_frames;
memory_video_file_->Seek(current_position, SEEK_SET);

if (tmp_av_state->packet_->pts == AV_NOPTS_VALUE) {
// zero_latency_ = false;
}
curr_num_frames++;
}
}

num_frames_ = curr_num_frames;
Reset();
bool FramesDecoder::IsFormatSeekable() {
if (
av_state_->ctx_->iformat->read_seek == nullptr &&
av_state_->ctx_->iformat->read_seek2 == nullptr) {
return false;
}

return av_state_->ctx_->pb->read_seek != nullptr;
JanuszL marked this conversation as resolved.
Show resolved Hide resolved
}

void FramesDecoder::BuildIndex() {
Expand Down
15 changes: 13 additions & 2 deletions dali/operators/reader/loader/video/frames_decoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,13 @@ struct AvState {
av_frame_free(&frame_);
}
avcodec_free_context(&codec_ctx_);
avformat_close_input(&ctx_);
avformat_free_context(ctx_);
if (ctx_ != nullptr) {
JanuszL marked this conversation as resolved.
Show resolved Hide resolved
if (ctx_->pb != nullptr) {
avio_context_free(&ctx_->pb);
}
avformat_close_input(&ctx_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://ffmpeg.org/doxygen/trunk/demux_8c_source.html#l00369, 'avformat_close_input' seems to call avformat_free_context and avio_close which calls avio_context_free. Still please double check this in the version of FFmpeg we use (asan doesn't complain so I think we are not leaking anything).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that based on the documentation that said that allocation of this context should be freed. Since in this scenario that we use it the wrapping object destructor takes care of it, I removed the call in our code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sometimes the examples and docs of FFmpeg are not fully accurate.

avformat_free_context(ctx_);
}

ctx_ = nullptr;
codec_ = nullptr;
Expand Down Expand Up @@ -207,6 +212,8 @@ class DLL_PUBLIC FramesDecoder {

bool is_full_range_ = false;

std::optional<bool> zero_latency_ = {};
JanuszL marked this conversation as resolved.
Show resolved Hide resolved

private:
/**
* @brief Gets the packet from the decoder and reads a frame from it to provided buffer. Returns
Expand Down Expand Up @@ -249,6 +256,10 @@ class DLL_PUBLIC FramesDecoder {

void ParseNumFrames();

void CreateAvState(std::unique_ptr<AvState> &av_state, bool init_codecs);

bool IsFormatSeekable();

std::string Filename() {
return filename_.has_value() ? filename_.value() : "memory file";
}
Expand Down
Loading