Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lavc/rkmppenc: add RKMPP MJPEG/JPEG encoder #373

Merged
merged 1 commit into from
Apr 21, 2024

Conversation

nyanmisaka
Copy link
Member

Up to 1080p@240fps or 320x180@1100fps transcoding (hit the decoder perf limit) on RK3588.
Or 320x180@6000fps encoding from rawyuv nv12 testsrc2/nullsrc.

Changes

  • Add RKMPP MJPEG/JPEG encoder

@gnattu
Copy link
Member

gnattu commented Mar 28, 2024

This is interesting. Rockchip embedded JPEG encoder is significantly faster than Intel's and Apple's in nullsrc testing(3x-4x fast).

By looking at the datasheet, it has a 4x90MPixel/s JPEG encoder. At this rate, it could easily outperform powerful CPUs in terms of JPEG encoding. For example, the UHD770's MJPEG encoder can only encode at 40% of the speed of a 12900 processor, achieving 2000fps for 320x180, while the CPU itself sits at > 5000fps, but still bellows 6000.

But in reality, this speed cannot be achieved by Jellyfin, as Jellyfin is not a MJPEG camera system and does not typically use MJPEG videos as input. Additionally, the real-life performance in Jellyfin can be easily affected by the rescale filter and image sink, which I expect to be the main performance bottleneck on RK3588. However, RK3588 can be very powerful for MJPEG input to MJPEG output camera stream systems, outperforming much higher-end platforms.

@nyanmisaka
Copy link
Member Author

nyanmisaka commented Mar 28, 2024

image

The async depth of the mjpeg_qsv encoder seems to be hardcoded in the runtime. Raising it may increase parallelism, but it doesn't work.

@gnattu
Copy link
Member

gnattu commented Mar 28, 2024

image

The async depth of the mjpeg_qsv encoder seems to be hardcoded in the runtime. Raising it may increase parallelism, but it doesn't work.

Take it easy, as we will not be getting close to the encoder limitation. The FPS filter dropping output frames effectively increases the decoder pressure by 30 times or higher, and most of our workload is going to be bottlenecked by the decoder in reality anyway. 1000fps is more than enough for this.

@nyanmisaka
Copy link
Member Author

Take it easy, as we will not be getting close to the encoder limitation. The FPS filter dropping output frames effectively increases the decoder pressure by 30 times or higher, and most of our workload is going to be bottlenecked by the decoder in reality anyway. 1000fps is more than enough for this.

It' just a test.

image

Intel uses different copy engines in hwupload (RCS) and qsvenc internal (BCS).
RCS - Rendering Command Stream
BCS - Blittering Command Stream

This creates a noticeable performance difference.

Signed-off-by: nyanmisaka <nst799610810@gmail.com>
@nyanmisaka nyanmisaka marked this pull request as ready for review April 21, 2024 10:49
@nyanmisaka nyanmisaka requested a review from a team April 21, 2024 10:53
@nyanmisaka nyanmisaka merged commit 030d9b8 into jellyfin:jellyfin Apr 21, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants