Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

media_codec: Fix dequeue_output_buffer treating some events as error #316

Closed
wants to merge 2 commits into from

Conversation

zmerp
Copy link
Contributor

@zmerp zmerp commented Jul 10, 2022

The MediaCodec wrapper had some issues with handling some non-error results for dequeue_output_buffer. This PR fixes it.

The PR has been tested for the decoder path in soon-to-be-in-production code. I did not experience any unhandleable error originating inside the ndk calls.

Here I used unreachable!() to avoid adding an unused variant in MediaCodecOutputResult and I would like some feedback for a better solution.

@MarijnS95
Copy link
Member

Here I used unreachable!() to avoid adding an unused variant in MediaCodecOutputResult and I would like some feedback for a better solution.

Can you call from_status() unconditionally and match on the Ok(()), Err(ffi::AMEDIACODEC_INFO_TRY_AGAIN_LATER) and Err(_)?

@zmerp
Copy link
Contributor Author

zmerp commented Jul 10, 2022

@MarijnS95 The problem is that result is not used in a regular way. AMediaCodec_dequeueOutputBuffer documentation says that 0 or positive values are the buffer index, negative values are error or other non-fatal events. I cannot wrap the result with from_status, first I need to check if the value is a buffer index. But then the Ok(()) value still remains redundant and should never be hit.

@zmerp zmerp force-pushed the master branch 2 times, most recently from 86e7fc7 to 33f0f75 Compare July 10, 2022 21:07
@zmerp
Copy link
Contributor Author

zmerp commented Jul 10, 2022

I've replaced .map(|()| unreachable!()) with the more idiomatic Err([...].unwrap_err())

@zmerp
Copy link
Contributor Author

zmerp commented Aug 27, 2022

The MediaCodec API in ndk v0.7 is flat out broken without this PR.

@zmerp
Copy link
Contributor Author

zmerp commented Sep 6, 2022

@MarijnS95 ping, I'm using my branch in production and it works, I think this PR can be merged.

@MarijnS95
Copy link
Member

@zarik5 This patch simply fails to explain why the caller can't match on the Err results from these functions, it seems there's nothing wrong with the current implementation hence this PR isn't "fixing" anything?

@zmerp
Copy link
Contributor Author

zmerp commented Sep 8, 2022

@MarijnS95 I see this is a matter of convention. I think AMEDIACODEC_INFO_* events should be not considered as error, in fact there is “INFO” in the name. Furthermore AMEDIACODEC_INFO_* event are not “universal” media_status_t errors, the meaning of these events represented by low magnitude negative integers seems specific to the AMediaCodec_dequeueOutputBuffer call.

@MarijnS95
Copy link
Member

Any documentation we can quote that states that these functions return both AMEDIACODEC_INFO_* or any of the AMEDIA_* error values?

Regardless, I'd rather see anything that doesn't return an OutputBuffer to be treated as an Err. Perhaps your MediaCodecOutputResult can instead wrap NdkMediaError? Maybe give input buffers the same treatment. Not sure what the best way forward is here, except that the current implementation looks messy and inconsistent.

@MarijnS95
Copy link
Member

Nevertheless, don't forget to document this problem and API change in the changelog.

@zmerp zmerp force-pushed the master branch 4 times, most recently from 2fc96ce to 1e4c992 Compare September 8, 2022 11:38
@zmerp
Copy link
Contributor Author

zmerp commented Sep 8, 2022

@MarijnS95 I looked in the source code and it's not completely clear what to do either https://android.googlesource.com/platform/frameworks/av/+/master/media/ndk/NdkMediaCodec.cpp#693 . I tried another opinionated solution.
In any case I fixed also a "real" bug, the AMEDIACODEC_ERROR_INSUFFICIENT_RESOURCE and AMEDIACODEC_ERROR_RECLAIMED error were recognized as a buffer index. I changed the check from "res >= 0" to "0 <= res < 1000"

Ok(None)
} else if result >= 0 {
Ok(Some(OutputBuffer {
// result usage: https://android.googlesource.com/platform/frameworks/av/+/master/media/ndk/NdkMediaCodec.cpp#693
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a permalink: if the file changes the line number wont match up anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -269,8 +284,7 @@ impl MediaCodec {
}
}

/// Returns [`None`] if timeout is reached.
pub fn dequeue_input_buffer(&self, timeout: Duration) -> Result<Option<InputBuffer>> {
pub fn dequeue_input_buffer(&self, timeout: Duration) -> Result<InputBuffer> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this then return MediaCodecResult if AMediaCodec_dequeueInputBuffer has the ability to return MediaCodecInfo::TryAgainLater?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, the source code is not clear in this regard, and every sample I've seen online does not handle TryAgainLater for the dequeueInputBuffer call.

@MarijnS95
Copy link
Member

In any case I fixed also a "real" bug, the AMEDIACODEC_ERROR_INSUFFICIENT_RESOURCE and AMEDIACODEC_ERROR_RECLAIMED error were recognized as a buffer index. I changed the check from "res >= 0" to "0 <= res < 1000"

That's absolutely horrible, is there any documentation stating the highest buffer index being used? Or is it just these two loose positive values?

If you can't find anything I'll ping some NDK contacts to see what's up here.

@zmerp
Copy link
Contributor Author

zmerp commented Sep 8, 2022

@MarijnS95 I didn't find any information about a "minimum error code integer" or "maximum buffer index". The "1000" cut point was completely arbitrary on my side.

@zmerp zmerp force-pushed the master branch 2 times, most recently from 933808c to f1d38a0 Compare September 11, 2022 21:08
@zmerp
Copy link
Contributor Author

zmerp commented Sep 11, 2022

AMediaCodec_dequeueInputBuffer can indeed return AMEDIACODEC_INFO_TRY_AGAIN_LATER. I fixed it.

@@ -269,8 +285,8 @@ impl MediaCodec {
}
}

/// Returns [`None`] if timeout is reached.
pub fn dequeue_input_buffer(&self, timeout: Duration) -> Result<Option<InputBuffer>> {
/// Can return only `MediaCodecInfo::TryAgainLater` with `MediaCodecResult::Info(...)`.
Copy link
Member

@MarijnS95 MarijnS95 Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these intradoc-links please :) (and grammar):

Suggested change
/// Can return only `MediaCodecInfo::TryAgainLater` with `MediaCodecResult::Info(...)`.
/// Can only return [`MediaCodecResult::Info(...)`] with [`MediaCodecInfo::TryAgainLater`].

@MarijnS95
Copy link
Member

@MarijnS95 I didn't find any information about a "minimum error code integer" or "maximum buffer index". The "1000" cut point was completely arbitrary on my side.

Anything we can do about this? Otherwise I'd recommend purely parsing the two positive error codes and considering any other positive number as a valid buffer index.

https://github.com/rust-windowing/android-ndk-rs/blob/25e647abe800d08453b74c3a1fa90528ff97844c/ndk-sys/src/ffi_aarch64.rs#L14563-L14568

@MarijnS95
Copy link
Member

Otherwise can you inquire about this over at https://github.com/android/ndk?

@zmerp
Copy link
Contributor Author

zmerp commented Sep 14, 2022

Anything we can do about this? Otherwise I'd recommend purely parsing the two positive error codes and considering any other positive number as a valid buffer index.

Not a good idea to me, it might break in a new API level.

Otherwise can you inquire about this over at https://github.com/android/ndk?

Sure.

@MarijnS95
Copy link
Member

Not a good idea to me, it might break in a new API level.

It is already broken by design unless there's a maximum buffer index.

@zmerp
Copy link
Contributor Author

zmerp commented Sep 14, 2022

Otherwise can you inquire about this over at https://github.com/android/ndk?

Here you go: android/ndk#1768

@MarijnS95 MarijnS95 added type: documentation Awareness, docs, examples, etc. priority: low Nice to have status: waiting Waiting for a response or another PR labels Oct 25, 2022
@MarijnS95
Copy link
Member

Looks like you bumped the issue today @zarik5, unfortunate to see that upstream hasn't provided any clarification thus far :(

@zmerp
Copy link
Contributor Author

zmerp commented Dec 21, 2022

Given no feedback upstream, should we merge this PR with my assumption of 1000 as an absolute value cutting point? I think It's reasonable, because how C APIs normally evolve. Usually error codes get added with increasing magnitude, maybe sometimes skipping tens or hundreds to reserve slots to more neatly group errors by scope, but almost never going under the lowest initial error value (in absolute terms).
As for testing, I've been using this PR in production for a while now.

@MarijnS95
Copy link
Member

In its current form this PR proposes a breaking change, merging it would mean forcing the release of ndk 0.8.0 and the entire ecosystem propagation mess with it whenever we want to release an ndk update for this or other means.

Rather we accept that this API was fundamentally broken and Deal With It™ until the next sensible opportunity to correct it. Users can currently work around this by manually matching these "error-like" values out of index anyway.

One thing you could do is return these codes in Err(NdkMediaError::UnknownResult(...)) which would not break the API, though we cannot extend MediaErrorResult as it has never been marked #[non_exhaustive] making it equally inconvenient for users to figure out what they have to do. However, I'd like to sanity-check that with a second opinion from someone else who uses the media codec API to prevent this whole "oh we discovered the API doesn't represent important thing X" situation within a month after release again.

@zmerp
Copy link
Contributor Author

zmerp commented Dec 21, 2022

Correct me if I'm wrong, but if a certain API is completely broken, it's ok to fix it with a patch release (-.-.X) even if the API surface changes in a breaking way (it's a bug fix). I imagine noone in the wild is doing what you propose right now, instead if they are interested in the Mediacodec API they are probably using my PR.

@MarijnS95
Copy link
Member

MarijnS95 commented Dec 21, 2022

If it goes paired with a yank of the broken release, within a day of releasing, maybe? At this point you can't assume no-one is using MediaCodec, and you can't stop their project from compiling by publishing a semver-breaking release without bumping semver.

@zmerp
Copy link
Contributor Author

zmerp commented Dec 21, 2022

I'm not gonna debate further. I'll will continue using my fork for production, I hope this PR will land sometimes. Maybe add it to a 0.8 tag as a reminder?

@MarijnS95
Copy link
Member

Nothing to debate, circumventing semver is simply not done.

Proposing another alternative: new unbroken functions with a different name, and #[deprecate] the broken ones (without duplicating all code)?

@zmerp
Copy link
Contributor Author

zmerp commented Dec 21, 2022

Proposing another alternative: new unbroken functions with a different name, and #[deprecate] the broken ones (without duplicating all code)?

No reason messing with it IMO. Let's merge this PR once you are ready to release v0.8.

…isinterpreting some errors as buffer indices
@zmerp
Copy link
Contributor Author

zmerp commented Mar 8, 2023

If a new release of ndk is imminent, please consider merging this PR.

@MarijnS95
Copy link
Member

@zarik5 as said before, only if it were a semver-breaking release.

@zmerp
Copy link
Contributor Author

zmerp commented Mar 8, 2023

I saw that the MSRV got bumped, this is a breaking change.

@MarijnS95
Copy link
Member

Hardly anyone seems to consider this breaking (see how we got broken by raw-window-handle 0.5.1 which is what triggers the MSRV bump on our end), so I won't make it a breaking release either.

@zmerp
Copy link
Contributor Author

zmerp commented Mar 8, 2023

That's fair.

@MarijnS95 MarijnS95 added the impact: breaking API/ABI-breaking change label Jun 15, 2023
@MarijnS95
Copy link
Member

MarijnS95 commented Jun 15, 2023

@zarik5 Now that a new breaking release is impending I've given this PR another look, and there are a few observations:

  • The two extra status codes that you're now handling for dequeue_output_buffer() were previously caught as "unknown" errors, while it is actually important for the user to handle (just like TRY_AGAIN_LATER): that is the issue you're trying to fix;
  • These error codes are not properly negative;
  • As such there's no need to set up an arbitrary positive-integer bound, given that we don't know anything about allowed values. If new error codes arise, we'll find out soon enough when AMediaCodec_get{Input,Output}Buffer returns NULL on those, though we should perhaps improve the assertion message around it;
  • In the end it contributes nothing to the issue you're trying to fix, so I'd leave it out;
  • dequeue_input_buffer() does not necessarily need to change, though we could self-document the None case by wrapping it in a custom error code;
  • Keeping the rest of the error codes in the original Result allows more natural ? bubbling of errors, requiring only special cases to be handled.

I've stuffed all that thought in a new approach, which is quite a bit simpler than this PR. If you agree with this, I'll PR and merge the following (after rebasing on #399):

https://github.com/rust-mobile/ndk/compare/ndk-media-codec-rewrap-dequeue-results

Then, some followup items:

  • InputBuffer::buffer_mut() returns a slice of possibly uninitialized data, which is UB. This should return a [MaybeUninit<u8>]. Unfortunately many useful slice APIs around MaybeUninit like write_slice() are still unstable, but that shouldn't hold us back from removing UB;
  • dequeue_output_buffer() could use MaybeUninit::uninit();
  • InputBuffer nor OutputBuffer has a drop destructor. Especially for OutputBuffer - which already has access to MediaCodec - should we replace release_output_buffer() with a drop() implementation? release_output_buffer_at_time() could remain, and InputBuffer could have a warning message when the buffer is dropped without being enqueued.

Thoughts?

@zmerp
Copy link
Contributor Author

zmerp commented Jun 16, 2023

These error codes are not negative;

They are: https://developer.android.com/ndk/reference/group/media#summary

  • AMEDIACODEC_INFO_TRY_AGAIN_LATER = -1
  • AMEDIACODEC_INFO_OUTPUT_FORMAT_CHANGED = -2
  • AMEDIACODEC_INFO_OUTPUT_BUFFERS_CHANGED = -3

dequeue_input_buffer() does not necessarily need to change, though we could self-document the None case by wrapping it in a custom error code;

Yeah, my gripe is that None was not descriptive enough. But making the user handle TRY_AGAIN_LATER in Result::Err, as it's not an error at all (even though its code is negative), it's even worse in my opinion. We could make another Result type with Ok(), TryAgainLater, Err() variants.

I have nothing contrary to your other changes you described, especially the removal of release_output_buffer().

@MarijnS95
Copy link
Member

These error codes are not negative;

They are: https://developer.android.com/ndk/reference/group/media#summary

@zarik5 I'm sorry, I meant to say "These error codes are properly negative", otherwise the bullet-point before and after don't make sense 🤭


Yeah, my gripe is that None was not descriptive enough. But making the user handle TRY_AGAIN_LATER in Result::Err, as it's not an error at all (even though its code is negative), it's even worse in my opinion. We could make another Result type with Ok(), TryAgainLater, Err() variants.

Have you looked at the branch that I linked? I've handled that properly there, by defining two different enums for input and output dequeueing. Let me know what you think of this approach.

I have nothing contrary to your other changes you described, especially the removal of release_output_buffer().

Will see what I can do.

@zmerp
Copy link
Contributor Author

zmerp commented Jun 16, 2023

Sorry, I was on mobile. The branch looks fine, except that I would change slightly the names of the result structs. instead of DequeueOutputBuffer I would do DequeuedOutputBuffer (notice the extra "d"), or better, DequeueOutputBufferResult. The original name conveys an action, but the struct is an object.

@MarijnS95
Copy link
Member

I agree to append -d, but not sure about Result because it's already wrapped in a Result<>, that might be odd? Maybe Dequeued{In,Out}putBufferStatus, though that only implies a status code and not a possible result value.

@zmerp
Copy link
Contributor Author

zmerp commented Jun 16, 2023

Another option is the suffix -Value. But having "Buffer" in the name is misleading as the value might also mot be a buffer. What about DequeueOuputValue and DequeueInputValue? My favorite solution is still to squash down the result to a single enum with variants Ok(), TryAgainLater, ..., Err() but i understand that this may not be the "rusty" way.

@MarijnS95
Copy link
Member

@zarik5 sure, I think the Buffer part was mainly based on the function name, but then the variant containing a buffer is aptly named Buffer already. I might stick to Result instead of Value though. Don't forget about the -d suffix either :)

As explained in the linked commit message flattening the Result<> out of the equation doesn't allow one to do ? anymore, and always have to write an extra match arm to deal with the Err() type. Makes it more annoying to deal with anyhow::Context, too.

@MarijnS95
Copy link
Member

In hindsight keeping Buffer in there is more natural, even if more verbose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact: breaking API/ABI-breaking change priority: low Nice to have status: waiting Waiting for a response or another PR type: documentation Awareness, docs, examples, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants