-
Notifications
You must be signed in to change notification settings - Fork 460
Media SDK MFE Architecture
With a view of execution and MSDK<->driver communication, single frame architecture and code flow is almost the same for legacy encode and FEI:
-
Media SDK receives frame for processing(pre-ENC/ENC/ENCODE) from application in synchronous part(application thread), verifies parameters and prepares scheduler task for execution via either:
- MFXVideoENCODE_EncodeFrameAsync - legacy encode, FEI encode.
-
Scheduler checks task dependencies like reordered frames tasks and if dependencies are resolved for this task and thread 0(dedicated thread) is free, assigns task for execution and starts asynchronous part.
-
VA Layer Encoder prepares and submits buffers for driver:
-
Pixel buffers: raw, reconstructed surfaces, downscaled surfaces for pre-ENC.
-
Compressed buffers: compressed headers, bitstream buffer.
-
VA control buffers with input parameters: sequence level control parameters, picture control parameters, slice control parameters.
-
FEI non-pixel output buffers: motion vectors, MB statistics, PAK control object.
-
Additional non pixel video memory control buffers: MBQP map, Intra/Skip map.
-
-
Encoder submits buffers to driver UMD. Standard VAAPI calling sequence:
-
vaBeginPicture
-
vaRenderPicture
-
vaEndPicture.
-
-
Driver UMD prepares and submits batch buffer for execution on HW at vaEndPicture call.
-
Encoder proceeds to synchronization step(see synchronization chapter for details), waits for task to be completed and returns result (bitstream or FEI output).
-
Scheduler returns control to an application in MFXVideoCORE_SyncOperation and output data can be used by an application.
Media SDK implements 2 synchronization approaches: blocking synchronization, event or polling based synchronization.
Blocking synchronization is implemented in Linux stack and based on synchronization by resource (for encoder input surface). Dedicated encoder thread is blocked for an execution until task is ready, then returns control over a thread back to a scheduler.
Media SDK session represents an object that shares access for high level functional objects:
-
Decoder - provides decoding capabilities
-
Encoder - provides encoding capabilities
-
VPP - provides video processing capabilities(DI, DN, Scale, FRC, CSC, etc.)
-
User plugin - provides user with capability to implement it's own processing function(CPU, OpenCL, etc), integrated with Media SDK APIs.
-
Core - provides service functions(copy, memory mapping, memory sharing, object sharing) for components and interfaces between external and internal components for memory allocation and sharing(allocators and devices).
-
Scheduler - thread and task allocation, execution, managing task queue, execution order, synchronization, etc.
Joining sessions will create link between sessions and remove scheduler object from child session, thus core objects are able to share resources between different instances of components and allocators, manage thread and task scheduler within one scheduler object.
In joined sessions scenario there is only one scheduler for all encoders. Dedicated thread together with blocking synchronization architecture leads to following:
-
Dedicated thread is blocked on waiting for a task readiness, so it can't submit a new task before previous is finished.
-
All encoder submissions are serialized, so there is no concurrency, as a result no ENC/PAK concurrency between different encoders.
-
Multi-frame scenario is not working due there is no ability to submit several frames from different encoders, working in one thread.
Removing dedicated thread dependency solves this issue, thus even with Joined session scenario all encoders work using their own threads(free threads from common pull). That is true for legacy Encode, FEI Encode, Enc and Pre-Enc paths. Currently, most of Media SDK components do not work in a dedicated thread
VAAPI: synchronization has changed from vaSyncSurface to vaMapBuffer in encoder, to change synchronization target from input surface to bitstream - this will help to resolve:
-
In a scenario where single input is shared with multiple encoders, vaSyncSurface will synchronize with a latest one, so encoding result for all encoders depends on input and can bring latency problems.
-
MFE will benefit as a single kernel workload utilizes all input and is bound by a biggest one, but PAK is independent and smaller one can provide result faster, which is not reachable when synchronization is done by an input surface.
Inter session(ISL) layer is introduced as an object shared between different encoders and accessed from "VA layer" encoder part. It's main feature in combination with MFE VAAPI is that single frame execution of VA layer is not changed. ISL provides capabilites for:
-
Collecting frames, e.g. buffer frames from different encoders to submit MFE workload.
-
Frame management, e.g. decision about what frames to combine together for better performance.
-
Thread/Time management, required to feet particular frame readiness within latency requirement.
In MFE workload preparation step combines preparation and submission steps, described in part 3 of single frame encoder pipeline, including VAAPI function calls, but real submission moved to ISL exercising MFE VAAPI. Preparation is done separately for each encoder, involved into MFE pipeline. So each encoder uses it's own Encode VAContext prepares buffers and calls vaBeginPicture, vaRenderPicture and vaEndPicture sequence, after that proceeds into ISL layer.
Submission responsibility is moved from VA layer to ISL in MFE pipeline and performed by vaMFSubmit call.
Frames collection is performed in ISL layer, this is the most functionally complex part of MFE pipeline, performing frames and threads management. Below are diagrams, demonstrating several scenarios of ISL behavior in different situations.
Introduces case with 4 parallel streams, running in MFE auto mode with maximum 4 frames allowed for submission, the normal flow can be described as following:
-
After preparing frames in VA layer, each Thread/Session/Stream calls an ISL layer and takes control over it by locking a control mutex; ISL layer checks, if there are frames,enough to submit, and if not, current thread releases control over ISL to other threads by unlocking control mutex and waits either for a timeout to happen or for a submission condition to be satisfied.
-
Thread 4 submits frames to UMD and signals that submission condition is resolved to other threads, waiting on condition.
-
All threads go to synchronization stage, performed separately at each encode VAContext.
This example shows how particular encoding latency constraints achieved
in MFE architecture.
Introduces case with 4 parallel streams, running in MFE auto mode with
maximum 4 frames allowed for submission, flow can be described as
following:
-
Session/thread/stream 1, 2 and 3 have submitted frames and are waiting for either submission or timeout to happen
-
Session/thread/stream 1 has reached specified timeout(for example 30 fps stream has a requirement to achieve 33 ms sync-to-sync latency, so timeout is set in range 1-33 ms, depending on frame submission time), takes control over ISL and submits 3 frames.
-
Session fourth frame has arrived late and goes to next submission.
- Media SDK for Linux
- Media SDK for Windows
- FFmpeg QSV
- GStreamer MSDK
- Docker
- Usage guides
- Building Media SDK
- Running Media SDK CI tests
- Additional information
- Multi-Frame Encode