-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISDK-2241: Co-Viewing example with custom AudioDevice #325
base: master
Are you sure you want to change the base?
Conversation
* Need to add KVO.
* Use AVPlayerItemVideoOutput, and CADisplayLink to pull frames.
* TODO - Set the MTAudioTapProcessor.
* Still need to figure out how to consume frames properly.
* TODO - Audio is heavily distorted.
* Add a rendering method for WebRTC audio. * Hook up both audio tap and rendering inputs to the mixer.
* In this example we don't need any fixed size buffers or other pre-allocated resources. We will simply write | ||
* directly to the AudioBufferList provided in the AudioUnit's rendering callback. | ||
*/ | ||
return YES; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs to be updated.
import Foundation | ||
import MediaToolbox | ||
|
||
class ExampleAVPlayerAudioTap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class can probably be removed, unless its useful to demonstrate why we didn't write the audio code in Swift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess lets remove it if it is not used.
attributes = [ | ||
kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_420YpCbCr8BiPlanarFullRange | ||
] as [String : Any] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should explicitly add kCVPixelBufferIOSurfacePropertiesKey. I was doing it wrong before.
kCVPixelBufferIOSurfacePropertiesKey as String : [:]
platform :ios, '11.0' | ||
project 'CoViewingExample.xcproject' | ||
|
||
pod 'TPCircularBuffer', '~> 1.6' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to call it out, this PR is adding a dependency which is only used by the new example.
} | ||
self.capturingContext->deviceContext = context; | ||
self.capturingContext->maxFramesPerBuffer = _capturingFormat.framesPerBuffer; | ||
self.capturingContext->audioBuffer = _captureBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
355-361 should probably be moved into initializeCapturer()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the Playback side of changes. Reviewing the app and processing tap next, and then the capturing side. 😅
ALWAYS_SEARCH_USER_PATHS = NO; | ||
CLANG_ANALYZER_NONNULL = YES; | ||
CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE; | ||
CLANG_CXX_LANGUAGE_STANDARD = "gnu++14"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to change this in sample app?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use c++14 to be safe.
ALWAYS_SEARCH_USER_PATHS = NO; | ||
CLANG_ANALYZER_NONNULL = YES; | ||
CLANG_ANALYZER_NUMBER_OBJECT_CONVERSION = YES_AGGRESSIVE; | ||
CLANG_CXX_LANGUAGE_STANDARD = "gnu++14"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; | ||
GCC_WARN_UNUSED_FUNCTION = YES; | ||
GCC_WARN_UNUSED_VARIABLE = YES; | ||
IPHONEOS_DEPLOYMENT_TARGET = 11.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far this is all I have tested on, and can validate. We should talk about what versions we want to support in the example.
@class ExampleAVPlayerAudioDevice; | ||
|
||
typedef struct ExampleAVPlayerAudioTapContext { | ||
__weak ExampleAVPlayerAudioDevice *audioDevice; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id<TVIAudioDevice>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some sort of callback protocol here. If it was just TVIAudioDevice
I wouldn't know what custom method to call.
@property (nonatomic, strong, nullable) TVIAudioFormat *renderingFormat; | ||
@property (nonatomic, assign, readonly) BOOL wantsAudio; | ||
@property (nonatomic, assign) BOOL wantsCapturing; | ||
@property (nonatomic, assign) BOOL wantsRendering; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have documentation around wantsAudio
, wantsCapturing
and wantsRendering
.
audioUnitDescription.componentFlags = 0; | ||
audioUnitDescription.componentFlagsMask = 0; | ||
return audioUnitDescription; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method being used anywhere?
if (status != noErr) { | ||
NSLog(@"Could not set stream format on the input bus!"); | ||
AudioComponentInstanceDispose(_voiceProcessingIO); | ||
_voiceProcessingIO = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we dispose mixer as well ? Probably we can have a strategy, setup mixer first, and if _voiceProcessingIO
gets into an error, we can dispose mixer as well. WDYT?
AudioStreamBasicDescription playerFormatDescription = renderingFormatDescription; | ||
if (self.renderingContext->playoutBuffer) { | ||
playerFormatDescription.mSampleRate = self.audioTapContext->sourceFormat.mSampleRate; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment would be nice why are we adapting to the buffer (AVPlayer) format
} | ||
} | ||
|
||
- (void)handleRouteChange:(NSNotification *)notification { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you get chance to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, though I do see route change events being fired during normal operation (That don't restart).
AudioConverterReset(context->captureFormatConverter); | ||
context->captureFormatConvertIsPrimed = NO; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has format conversion code as well. We will be using ExampleAVPlayerProcessingTap.m
in AVAudioEngine audio device as well where the format conversion will not be required. Can we move the audio format conversion code outside this file? another option is we can have processing tap per audio device, in that case we will need to add a layer in folder structure inside AudioDevices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I noticed, audio capturing side stops working when app goes into the background. It resumes coming into the foreground.
// Adjust for what the format converter actually produced, in case it was different than what we asked for. | ||
producerBufferList->mBuffers[0].mDataByteSize = ioPacketSize * bytesPerFrameOut; | ||
// printf("Output was: %d packets / %d bytes. Consumed input packets: %d. Cached input packets: %d.\n", | ||
// ioPacketSize, ioPacketSize * bytesPerFrameOut, context.sourcePackets, context.cachePackets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some commented code in this file but I guess you are aware of it and working on it.
|
||
if (status != kCVReturnSuccess) { | ||
// TODO | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to delete context / cleanup ?
import Foundation | ||
import MediaToolbox | ||
|
||
class ExampleAVPlayerAudioTap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess lets remove it if it is not used.
if !output.hasNewPixelBuffer(forItemTime: itemTimestamp) { | ||
// TODO: Consider suspending the timer and requesting a notification when media becomes available. | ||
// print("No frame for host timestamp:", CACurrentMediaTime(), "\n", | ||
// "Last presentation timestamp was:", lastPresentationTimestamp != nil ? lastPresentationTimestamp! : CMTime.zero) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi - this file has commented logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we should log once, suspend the timer and restart it later. This should be an easy fix. 👍
public var supportedFormats: [TVIVideoFormat] { | ||
get { | ||
let format = TVIVideoFormat() | ||
format.dimensions = CMVideoDimensions(width: 640, height: 360) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const will be nice
import AVFoundation | ||
import TwilioVideo | ||
|
||
class ExampleAVPlayerSource: NSObject, TVIVideoCapturer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel documentation around scheduling, sample queue and logic would be nice.
attributes: DispatchQueue.Attributes(rawValue: 0), | ||
autoreleaseFrequency: DispatchQueue.AutoreleaseFrequency.workItem, | ||
target: nil) | ||
super.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super.init() should be the first to call ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-null sample queue needed to be set before initializing super or else the compiler yelled at me. I'm not an expert at initializing swift classes so this might be wrong.
// Configure access token either from server or manually. | ||
// If the default wasn't changed, try fetching from server. | ||
if (accessToken == "TWILIO_ACCESS_TOKEN") { | ||
let urlStringWithRole = tokenUrl + "?identity=" + name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove the rooms demo url specific logic from here
* Pre-roll AVPlayer to coordinate start times between subsystems. * Use the audio master clock for AVPlayer. * Catch up when dequeuing old frames.
* AVPlayer playback was broken. * Don’t tie device recording format to AVAudioSession, we always want stereo. * AudioTap provides a rendering format, which is now used to configure the mixer input.
@ceaglest when I ran above example I see only presenter AVPlayer audio in audible in viewer side, but not presenter microphone audio. How to make both microphone and avplayer audio track audible/shared so that at viewer side we can listen both audio of microphone and avplayer of presneter |
Summary
This PR adds a new Co-Viewing example.
The sample app is mostly functional, but does not have completed UI and README.md. These items have been given separate tickets and will be completed after this PR (do we need a feature branch?).
The app is hardcoded with streamable content for presenters (see TODOs). Launching the app and tapping "Presenter" selects the hardcoded remote content URL. However, the app can also open the following document types:
If you encounter one of these types of videos anywhere on your iOS device (for example the Files app, or Dropbox), you can open them with Co-Viewing. This immediately connects to a Room as a Presenter.
Design
The Co-Viewing app has both Presenter and Viewer roles. A Viewer is generally a pretty standard Participant and shares their Camera and Microphone. A Presenter shares their camera, the video content, and an audio track with both microphone and player audio.
Please refer to the internal design doc for more info. I will also add a more detailed diagram of the audio pipeline.
Limitations
The world of video playback is complex, and AVPlayer seamlessly handles a lot of different content types. Not all of these types are compatible with CoViewing for various reasons. For example, don't expect to be able to stream any FairPlay encrypted content.
The following table lists what kind of content I've tested.
About HLS
If you really want to stream HLS content, there might be a way. An interested developer could perform several steps to covert a live stream into a progressive download mp4.
https
with a custom schemeappscheme
.This could be done with heavy dependencies like ffpmeg (remuxing is a one liner in the cli), or lighter open source projects. You could also imagine an offline version of the same process, which downloads all of the segments and then assembles a playable .mp4 file.
However, I don't know how Apple's App Store reviewers would feel about such a technique. Ultimately, it would be great if they fixed the bug with AVPlayer and MTAudioProcessingTap, or provided sample code which demonstrates how to use it with HLS content.
TODO