-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LFX Workspace: A Rust library crate for mediapipe models for WasmEdge NN #2355
Comments
MediaPipe Solutions1. Overview
2. Vision Tasks2.1. pre-process for vision tasksVision tasks have three input media types: image, video and live stream. For video and live stream, we must decode to images at first.
2.2. Object detectionThe number of models: 6 click here to official website to read more information about these 6 models post-process
2.3. Image classificationThe number of models: 4 click here to official website to read more information about these 4 models post-process
2.4. Hand landmarks detectionmodels: The hand landmarker model bundle contains a hand detection model and a hand landmarks detection model. The Hand detection model locates hands within the input image, and the hand landmarks detection model identifies specific hand landmarks on the cropped hand image defined by the hand detection model. Phase 1: hand detection post-process
Phase 2: hand landmarks detection post-process
Phase 3: generate output
2.5. Gesture recognitionmodels: The Gesture Recognizer contains two pre-packaged model bundles: a hand landmark model bundle and a gesture classification model bundle. The landmark model detects the presence of hands and hand geometry, and the gesture recognition model recognizes gestures based on hand geometry. Phase 1: [Hand landmarks detection](#Hand landmarks detection)Phase 2: Hand gesture recognition process and post-process
3. Text Tasks3.1. Text classificationThe number of models: 2 click here to official website to read more information about these 2 models pre-process
post-process
4. Audio Tasks4.1. Audio classificationThe number of models: 1 click here to official website to read more information about the model pre-process
post-process
Discussion: Do we need to import into WasmEdge from the MediaPipe C library as a host function?In my opinion, most of the data processing functions can be implemented in rust, even though some complex parts may take some time, such as wordpiece tokenization for bert, and FFT for audio pre-processing. For image decoding and encoding, we can use That's all, thanks. |
First-week reportIn the first week, I design the project architecture and implement the task: image classification and object detection (including pre-process, post-process, document, tests, and examples). 1. Multiple model format support.In MediaPipe, every essential information is bundled in models, such as input tensor shapes, output tensor shapes, classification label files, quantization parameters, and so on. So we have to parse the TfLite model to get information. And I design a model resource abstraction level, which is an abstraction for model information. Using this, the library can support other models simply: just implement the corresponding model parser. The architecture is as follows:
When loading a model, the library will detect the model format using file head magic and call the corresponding model parser to get all the model information we need. Now, I only implement the 2. Flexible input.The library defines a rust interface For image, this library implements the Next week's plan
|
2-3 week reportProgress
Next week's plan
|
4-Week Progress ReportSummary of Progress
Plan for Next Week
|
5-Week Progress ReportSummary of Progress
Plan for Next Week
|
Thanks! Do you have instructions for us to try these? I think we will probably eventually need ffmpeg to run as a plugin (native host functions) as opposed to compiling to Wasm for performance reasons. But that's for later! |
@juntao Yes, The repo has README file that shows how to use the library. And repo has examples and tests which can be directly run by About FFmpeg, the performance is slower than native. It is a temporary solution for video/audio processing before FFmpeg/Opencv plugins are available. Now, I use FFmpeg-wasm as a rust feature, which may be deleted after the FFmpeg plugin can be used. The repo also has scripts that show how to set up environments and run examples/tests with FFmpeg. The documentation and examples are not complete at this time, but I will update them once all MediaPipe tasks have been implemented. |
Hello! @juntao , I have encountered an issue with the MediaPipe Image Segmentation tasks, as a specific one of the models uses a custom operator called And I noticed that the MediaPipe source code includes 10 custom operators that could potentially be in use in the future. To resolve this issue, one possible solution is to build a custom Do I need to add this thing to the plan? |
I think you could try to build a new tf library that includes those operators (option 1) and see if works? If it does, we should perhaps make the new tf library build as part of our WASI NN plugin. Thanks. |
Ok, I have added it to the milestones. |
6-Week Progress ReportSummary of Progress
Plan for Next Week
Host Functions we need in OpenCVOpenCV is written in C++ so we also need class member functions for the classes. Classes
Functions
Host Functions we need in FFmpeg
|
7-Week Progress ReportSummary of Progress
Plan for Next Week
|
8-Week Progress ReportSummary of Progress
Plan for Next Week
|
That's great progress. Thank you so much for the update! |
Close as completed issue, feel free to reopen. |
Motivation
Mediapipe is a collection of ML models for streaming data. The official website provides Python, iOS, Android, and TFLite-JS SDKs for using those models. As WasmEdge is increasingly used in data streaming applications, we would like to build a Rust library crate that enables easy integration of Mediapipe models in WasmEdge applications.
Details
Each MediaPipe model has a description page that describes its input and output tensors. The models are available in Tensorflow Lite format, which is supported by the WasmEdge Tensorflow Lite plugin.
We need at least one set of library functions for each model in Mediapipe. Each library function takes in a media object and returns the inference result. The function performs the following tasks.
Milestones
crates.io
. (1 week)Repository URL: origin: https://github.com/yanghaku/mediapipe-rs-dev, now it will transfer to https://github.com/WasmEdge/mediapipe-rs
Mediapipe tasks progress:
Appendix
feat: A Rust library crate for MediaPipe models for WasmEdge NN
The text was updated successfully, but these errors were encountered: