Media SDK MFE Architecture

Media SDK architecture introduction.

App-Media SDK-Driver stack submission data flow

Figure 1. SW stack submission flow.

submission flow

With a view of execution and MSDK<->driver communication, single frame architecture and code flow is almost the same for legacy encode and FEI:

Media SDK receives frame for processing(pre-ENC/ENC/ENCODE) from application in synchronous part(application thread), verifies parameters and prepares scheduler task for execution via either:
- MFXVideoENCODE_EncodeFrameAsync - legacy encode, FEI encode.
Scheduler checks task dependencies like reordered frames tasks and if dependencies are resolved for this task and thread 0(dedicated thread) is free, assigns task for execution and starts asynchronous part.
VA Layer Encoder prepares and submits buffers for driver:
- Pixel buffers: raw, reconstructed surfaces, downscaled surfaces for pre-ENC.
- Compressed buffers: compressed headers, bitstream buffer.
- VA control buffers with input parameters: sequence level control parameters, picture control parameters, slice control parameters.
- FEI non-pixel output buffers: motion vectors, MB statistics, PAK control object.
- Additional non pixel video memory control buffers: MBQP map, Intra/Skip map.

Encoder submits buffers to driver UMD. Standard VAAPI calling sequence:
- vaBeginPicture
- vaRenderPicture
- vaEndPicture.
Driver UMD prepares and submits batch buffer for execution on HW at vaEndPicture call.

Encoder proceeds to synchronization step(see synchronization chapter for details), waits for task to be completed and returns result (bitstream or FEI output).
Scheduler returns control to an application in MFXVideoCORE_SyncOperation and output data can be used by an application.

Synchronization

Media SDK implements 2 synchronization approaches: blocking synchronization, event or polling based synchronization.

Blocking synchronization

Blocking synchronization is implemented in Linux stack and based on synchronization by resource (for encoder input surface). Dedicated encoder thread is blocked for an execution until task is ready, then returns control over a thread back to a scheduler.

Figure 2. Blocking synchronization design

blocking sync

Media SDK session overview

Media SDK session represents an object that shares access for high level functional objects:

Decoder - provides decoding capabilities
Encoder - provides encoding capabilities
VPP - provides video processing capabilities(DI, DN, Scale, FRC, CSC, etc.)
User plugin - provides user with capability to implement it's own processing function(CPU, OpenCL, etc), integrated with Media SDK APIs.
Core - provides service functions(copy, memory mapping, memory sharing, object sharing) for components and interfaces between external and internal components for memory allocation and sharing(allocators and devices).
Scheduler - thread and task allocation, execution, managing task queue, execution order, synchronization, etc.

Figure 4. Media SDK session architecture.

session

Joining sessions

Joining sessions will create link between sessions and remove scheduler object from child session, thus core objects are able to share resources between different instances of components and allocators, manage thread and task scheduler within one scheduler object.

Figure 5. Join session architecture.

joined session

Media SDK Internal changes and submission flow.

Figure 6. Media SDK submission flow change

mfe submission

Encoder dedicated thread change

detc

In joined sessions scenario there is only one scheduler for all encoders. Dedicated thread together with blocking synchronization architecture leads to following:

Dedicated thread is blocked on waiting for a task readiness, so it can't submit a new task before previous is finished.
All encoder submissions are serialized, so there is no concurrency, as a result no ENC/PAK concurrency between different encoders.
Multi-frame scenario is not working due there is no ability to submit several frames from different encoders, working in one thread.

Removing dedicated thread dependency solves this issue, thus even with Joined session scenario all encoders work using their own threads(free threads from common pull). That is true for legacy Encode, FEI Encode, Enc and Pre-Enc paths. Currently, most of Media SDK components do not work in a dedicated thread

MSDK/Driver task synchronization changes

VAAPI: synchronization has changed from vaSyncSurface to vaMapBuffer in encoder, to change synchronization target from input surface to bitstream - this will help to resolve:

In a scenario where single input is shared with multiple encoders, vaSyncSurface will synchronize with a latest one, so encoding result for all encoders depends on input and can bring latency problems.
MFE will benefit as a single kernel workload utilizes all input and is bound by a biggest one, but PAK is independent and smaller one can provide result faster, which is not reachable when synchronization is done by an input surface.

Inter-session layer concept.

ISL

Inter session(ISL) layer is introduced as an object shared between different encoders and accessed from "VA layer" encoder part. It's main feature in combination with MFE VAAPI is that single frame execution of VA layer is not changed. ISL provides capabilites for:

Collecting frames, e.g. buffer frames from different encoders to submit MFE workload.
Frame management, e.g. decision about what frames to combine together for better performance.
Thread/Time management, required to feet particular frame readiness within latency requirement.

Frames Preparation

In MFE workload preparation step combines preparation and submission steps, described in part 3 of single frame encoder pipeline, including VAAPI function calls, but real submission moved to ISL exercising MFE VAAPI. Preparation is done separately for each encoder, involved into MFE pipeline. So each encoder uses it's own Encode VAContext prepares buffers and calls vaBeginPicture, vaRenderPicture and vaEndPicture sequence, after that proceeds into ISL layer.

Frames Submission

Submission responsibility is moved from VA layer to ISL in MFE pipeline and performed by vaMFSubmit call.

Frames Collection

Frames collection is performed in ISL layer, this is the most functionally complex part of MFE pipeline, performing frames and threads management. Below are diagrams, demonstrating several scenarios of ISL behavior in different situations.

Smooth buffering with number of sessions equal to number of frames allowed.

Introduces case with 4 parallel streams, running in MFE auto mode with maximum 4 frames allowed for submission, the normal flow can be described as following:

After preparing frames in VA layer, each Thread/Session/Stream calls an ISL layer and takes control over it by locking a control mutex; ISL layer checks, if there are frames,enough to submit, and if not, current thread releases control over ISL to other threads by unlocking control mutex and waits either for a timeout to happen or for a submission condition to be satisfied.
Thread 4 submits frames to UMD and signals that submission condition is resolved to other threads, waiting on condition.
All threads go to synchronization stage, performed separately at each encode VAContext.

Figure 7. Smooth Frame Buffering

buffering

Timeout example.

This example shows how particular encoding latency constraints achieved in MFE architecture.
Introduces case with 4 parallel streams, running in MFE auto mode with maximum 4 frames allowed for submission, flow can be described as following:

Session/thread/stream 1, 2 and 3 have submitted frames and are waiting for either submission or timeout to happen
Session/thread/stream 1 has reached specified timeout(for example 30 fps stream has a requirement to achieve 33 ms sync-to-sync latency, so timeout is set in range 1-33 ms, depending on frame submission time), takes control over ISL and submits 3 frames.
Session fourth frame has arrived late and goes to next submission.

Figure 8. Frame buffering with reached timeout.

timeout

Home

Media SDK for Linux
- Media SDK in Linux Distributions
- Intel Graphics Support in Linux Kernels
Media SDK for Windows
- Media SDK dispatcher for Windows
- Media SDK for UWP applications
FFmpeg QSV
GStreamer MSDK
- Build GStreamer MSDK
Docker
- Running on GPU under docker
Usage guides
- Intel media stack on Ubuntu
- Performance monitoring and debug
Building Media SDK
Running Media SDK CI tests
- Run CI smoke tests
Additional information
- Media SDK Shaders (EU Kernels)
- Previous Media SDK products
Multi-Frame Encode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly