Skip to content
This repository has been archived by the owner on Mar 22, 2022. It is now read-only.

can Support customize local Video source in Unity like renderTexture? #35

Closed
raiscui opened this issue Aug 14, 2019 · 28 comments
Closed
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@raiscui
Copy link

raiscui commented Aug 14, 2019

can Support customize local Video source like renderTexture or some video player rendered texture?
also I want get and send rgb/D frame from kinect-dk

@djee-ms djee-ms self-assigned this Aug 14, 2019
@djee-ms djee-ms added the enhancement New feature or request label Aug 14, 2019
@djee-ms
Copy link
Member

djee-ms commented Aug 14, 2019

Hi @raiscui

  1. For the custom video source, I assume you mean being able to send to WebRTC custom frames coming from your app and not directly from the camera, like generated images. I would like to add that feature eventually, as well as allow the application to manipulate the camera itself. However I saw little demand for it so far, and I currently have other high-demand features in the work, so I wouldn't expect it in the short term.

  2. For the Azure Kinect-DK, I unfortunately didn't get a chance yet to try it with this project. For separate RGB or Depth, I think this should work, though this is probably not very interesting. If you want synchronized RGB-D frames however from both the RGB and Depth sensors then these need to be captured via some specialized API like MediaFrameSourceGroup on UWP. I see two solutions for that:

    • If custom sources (1.) were implemented you could capture the frames yourself with that API and send them to WebRTC.
    • Otherwise, in the current situation where the camera is managed by WebRTC itself, I suggest you open an issue with the WebRTC UWP project to see if they can add support on their side, since MixedReality-WebRTC currently leverages their code on UWP to capture video frames.

I appreciate these are not great answers, and there is no way currently to do any of that without modifying the code of MixedReality-WebRTC and/or the code of the WebRTC UWP project. If you want to give it a try however and submit a PR for that then we can talk about how to proceed.

@djee-ms
Copy link
Member

djee-ms commented Sep 9, 2019

Quick update - Azure Kinect doesn't work out of the box because it has a 7-microphone array, and WebRTC internally only supports audio capture devices with 4 channels. This is a problem with the Google implementation: https://bugs.chromium.org/p/webrtc/issues/detail?id=10881

@djee-ms
Copy link
Member

djee-ms commented Sep 9, 2019

Also important note @raiscui, just plugging in Azure Kinect on your PC will break WebRTC initializing, even if you don't plan to use it. And this is again due to Google's implementation and closed as "By Design":
https://bugs.chromium.org/p/webrtc/issues/detail?id=9653

@jahnotto
Copy link

Any news on item 1, @djee-ms ?

I am doing some custom rendering on Windows application and would like to stream ARGB frames to Unity. As far as I can tell, I can't use any other video source than the webcam.

@arufolo-wayahealth
Copy link

+1 for this feature.

@jahnotto
Copy link

Perhaps the frames could be sent as raw pixels through a datachannel? We'd miss compression though.

@djee-ms
Copy link
Member

djee-ms commented Sep 20, 2019

@jahnotto no there is a much better way with a custom video source. I too would really like to see that feature. Unfortunately we're currently quite busy with the v1.0 release planned for the end of the month, and some blockers like #74. I will see if I have some spare cycles to push a rough implementation, even partial, but I cannot promise you anything quite yet.

@jahnotto
Copy link

Ok, thanks! Looking forward to getting a custom video source.

Meanwhile, I'll use a DataChannel just for prototyping.

@jahnotto
Copy link

If anyone else is interested: as a temporary workaround, I am using a DataChannel to send the frames from a Windows desktop application to a Unity (HoloLens) app. It's very slow, and the messages are split into multiple sub messages that need to be aggregated on the HoloLens. It works for prototyping a solution though :) Looking forward to a proper custom video source implementation.

@djee-ms , would you like me to create a new enhancement issue for a custom video source, or should we just use this issue?

@djee-ms
Copy link
Member

djee-ms commented Sep 26, 2019

No I think we can leave that issue open for the custom video source enhancement, the title and discussion are relevant.

@jahnotto
Copy link

jahnotto commented Oct 1, 2019

In my temporary workaround using a datachannel, only a few frames are received if I send them too often (like every 100 ms). It almost seems like the message bus is flooded.

If I send the messages only once per second, it works fine.

Is this to be expected?

@djee-ms
Copy link
Member

djee-ms commented Oct 2, 2019

Did you check the buffering of data channels? I didn't try myself but I know that the internal data channel buffer can get saturated if you try to send faster than it can handles, and in that case calls to Send will fail and drop the data without sending it. You should monitor the OnBufferingChanged event and make sure the buffer doesn't get full. See the comment on the buffering event in the native C++ library:

/// Callback fired when data buffering changed.
/// The first parameter indicates the old buffering amount in bytes, the
/// second one the new value, and the last one indicates the limit in bytes
/// (buffer capacity). This is important because if the send buffer is full
/// then any attempt to send data will abruptly close the data channel. See
/// comment in webrtc::DataChannelInterface::Send() for details. Current
/// WebRTC implementation has a limit of 16MB for the buffer capacity.
using BufferingCallback =
Callback<const uint64_t, const uint64_t, const uint64_t>;

Unfortunately I just noticed that on the C# side the OnBufferingChanged handler is marked as internal and does not dispatch to a publicly-accessible event. But you can make a local modification to expose it. I will fix that in the meantime.

@jahnotto
Copy link

jahnotto commented Oct 2, 2019

Thanks again -- I will try that!

@djee-ms
Copy link
Member

djee-ms commented Oct 2, 2019

I pushed a change that should help, which exposes an event publicly, and ensures an exception is thrown if trying to send too fast. See 1bc2ca6.

@jahnotto
Copy link

jahnotto commented Oct 3, 2019

I am indeed getting an exception when I send too fast. The BufferingChanged event is never fired though.

@gtk2k
Copy link

gtk2k commented Nov 6, 2019

+1
I very want this feature.

@iamrohit1
Copy link

+1
This would be very helpful.
I've a native plugin setup in unity which takes the render texture data and encodes it using nvencode which outputs raw h264 packets.. Is there a way tap these into the stream?

@djee-ms
Copy link
Member

djee-ms commented Nov 8, 2019

Soon! This is on the roadmap for the next 1.1 release, hopefully by the end of the month or so, and there's already some work done for it. Need a few bug fixes, and some more testing and polish now before it's ready to be committed.

@jahnotto
Copy link

As requested by @djee-ms on the mixedreality-webrtc Slack channel, I'll describe our use case here:

We are doing raycast volume rendering for a HoloLens 2. As the HoloLens is not powerful enough for this type of rendering, we are doing remote rendering on a PC.

Some definitions used in this solution:
Client: the Unity app running on the HoloLens
Render server: a desktop PC application running on a PC with a powerful graphics card. The application is using VTK for rendering.

The desired data flow is as follows:

  1. For each Update() in a Unity GameObject, send a render request through a DataChannel from the client to the render server. The render request is a data structure containing all the information which is needed to set up the view frustum for the render server. This includes:
  • camera position
  • camera forward and up vectors
  • desired resolution of the rendered frame
  • stereo separation
  • projection matrices for left and right eye, respectively
  • a unique ID identifying the render request
  1. The render server configures the VTK view frustum according to the newest render request. Any render request received before the newest request is discarded. The render server may hence receive multiple render requests for each actual rendered frame, where only the newest request is actually rendered.
  2. The render server renders left and right eye to two separate RGBA bitmaps
  3. The two bitmaps are merged side- by side to one single bitmap. The horizontal resolution of the merged bitmap is thus twice the resolution of each individual bitmap
  4. The merged bitmap is fed into the video stream along with the render request ID
  5. The video is streamed to from the render server to the client using WebRTC.
  6. The client receives the video stream and displays it as a quad texture. Note that the client needs to know the render request id for each received frame in the video stream.

At the moment, we are using a temporary workaround for the following steps:
5. Each frame is compressed to a jpeg image
6. We are using a raw tcp socket connection to send each frame to the client along with the corresponding render request id.

Let me know if you need any further clarification or if you have any ideas on how to improve the data flow.

@chrisse27
Copy link

@djee-ms Our use-case looks like this:

  1. Capture video stream from frame-grabber (or webcam) in Unity.
  2. Process each frame, e.g. cropping, masking.
  3. Send processed frame to HoloLens.
  4. Render frame on HoloLens as texture on a quad.

Currently, we are reading the processed frame from the texture and are sending it via a TCP connection to HoloLens. Our goal is to replace this connection by webRTC and in particular profit from hardware decoding/encoding support.

@djee-ms
Copy link
Member

djee-ms commented Nov 13, 2019

@jahnotto thanks for the details. A few comments:

  • I am worried about step 1., as other users have reported trying to use data channels for high frequency data (sending the camera position each frame; see Transform/Projectionmatrix of current frame (Locatable camera) #83) was not working well, most likely due to the buffering that the data channels do at the SRTP protocol level. Did you not observe any such issue?
  • For step 6., have you looked into video multiplexing? It seems this is the way forward to send metadata associated with a video frame and ensure synchronization. Again, see Transform/Projectionmatrix of current frame (Locatable camera) #83 and especially this comment, although as pointed this would require some work from us if at all feasible. But pointing out the option anyway.

Otherwise it seems there is no major concern for the external video track feature. That should work with it as currently designed.

@chrisse27 thanks for the update too. Can I confirm in your case what you mean by "process each frame"? Is that done on the CPU side, or via GPU using shaders? Because grabbing a frame from the camera (VRAM) to pull it down to CPU memory for processing and re-uploading immediately to VRAM for hardware encoding for example would be a performance issue. This is incidentally what currently happens for H.264 on UWP and what we want to fix to get CPU usage and thermals lower. If you stay in system memory though and use a software encoder (VP8, VP9) it won't matter.

@iamrohit1
Copy link

@jahnotto my application has a similar pipeline right now to render remotely on a PC. You might want to take a look at 3D Streaming Toolkit which aims to solve a similar problem.

@chrisse27
Copy link

@djee-ms In our application, the processing is done via GPU using shaders.

@jahnotto
Copy link

@jahnotto thanks for the details. A few comments:

  • I am worried about step 1., as other users have reported trying to use data channels for high frequency data (sending the camera position each frame; see Transform/Projectionmatrix of current frame (Locatable camera) #83) was not working well, most likely due to the buffering that the data channels do at the SRTP protocol level. Did you not observe any such issue?
  • For step 6., have you looked into video multiplexing? It seems this is the way forward to send metadata associated with a video frame and ensure synchronization. Again, see Transform/Projectionmatrix of current frame (Locatable camera) #83 and especially this comment, although as pointed this would require some work from us if at all feasible. But pointing out the option anyway.

Step 1: I am worried too :) So far it seems fine, but I haven't done any real performance/latency testing because the way we encode frames now (frame-wise jpegs) introduce a very high latency anyway. I read through #83 earlier after you mentioned it in a reply on slack. It seems like the proposed solutions there (hacking the RTP header) requires that I already send a video stream from the hololens to the PC. In my case, I'm only sending camera/frustum settings + any interaction data like clipping planes etc.

Step 6: Thanks, multiplexing seems to be a good solution to sending metadata. Will I be able to use a multiplexed codec through MR-WebRTC though?

@jahnotto
Copy link

@jahnotto my application has a similar pipeline right now to render remotely on a PC. You might want to take a look at 3D Streaming Toolkit which aims to solve a similar problem.

I had a look on it some time ago. I gave it up because it doesn't support the Unity editor, and it doesn't build for ARM. MR-WebRTC seemed to be more promising.

@djee-ms
Copy link
Member

djee-ms commented Nov 14, 2019

Step 6: Thanks, multiplexing seems to be a good solution to sending metadata. Will I be able to use a multiplexed codec through MR-WebRTC though?

I didn't try it to be honest, though it should be like any other codec and work out of the box. The issue is that no metadata API is exposed, so would require some work to surface something in MR-WebRTC, which might be troublesome according to that comment from #83:

But the base WebRTC code currently has no public APIs to provide metadata input to encoding and extract it again upon receipt of each frame.

Also, there is the (unconfirmed) absence of SDP codec negotiation as mentioned on #83. Would that work for your use case? Can you assume an encoding is supported on both sides?

@jahnotto
Copy link

Step 6: Thanks, multiplexing seems to be a good solution to sending metadata. Will I be able to use a multiplexed codec through MR-WebRTC though?

I didn't try it to be honest, though it should be like any other codec and work out of the box. The issue is that no metadata API is exposed, so would require some work to surface something in MR-WebRTC, which might be troublesome according to that comment from #83:

But the base WebRTC code currently has no public APIs to provide metadata input to encoding and extract it again upon receipt of each frame.

Also, there is the (unconfirmed) absence of SDP codec negotiation as mentioned on #83. Would that work for your use case? Can you assume an encoding is supported on both sides?

I have full control of the hardware and software on both sides. Hence it should be safe to assume that a specific encoding is supported on both sides.

djee-ms added a commit to djee-ms/MixedReality-WebRTC that referenced this issue Dec 6, 2019
This change adds support for so-called "external video tracks", which
are local video tracks backed by an external source from the user.
Unlike the implicit source of a regular local video tracks, which
captures video frame from a local video capture device (webcam), an
external video track source dispatches to the WebRTC video engine some
frames provided by the user. This frame source is external from the
point of view of the WebRTC video engine, which doesn't know their
origin.

External video track sources allow the user to build local video tracks
backed by a custom video source, be it a procedurally generated image,
some rendered content, or even a frame from a video capture device not
natively supported by the underlying WebRTC implementation.

This change does not include any integration with Unity, although
external video track sources are exposed in C# so can still be used via
the C# library. Further integration will provide built-in Unity
components for these.

Issue: microsoft#35
This was referenced Dec 10, 2019
djee-ms added a commit that referenced this issue Dec 16, 2019
Add some Unity integration of the external video tracks feature by way
of a CustomVideoSource component allowing the user to inject some custom
video feed in the WebRTC connection.

As an example application, provide a SceneVideoSource component
capturing the content rendered by a given Unity camera and streaming it
as a video feed to the remote peer.

The VideoChatDemo Unity scene is updated to add a scene capture view
showing via a MediaPlayer component the scene content captured from the
camera back-buffer.

Bug: #35
@djee-ms djee-ms closed this as completed Dec 16, 2019
@Owaiskb
Copy link

Owaiskb commented Aug 10, 2021

+1
This would be very helpful.
I've a native plugin setup in unity which takes the render texture data and encodes it using nvencode which outputs raw h264 packets.. Is there a way tap these into the stream?

Hey, How did you encode render texture. Can you please explain me how you encoded it to h.264 and also does it use gpu for encoding. Please this would be big help 🙏

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants