-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Definition of active media session and the togglemicrophone, togglecamera, hangup actions #278
Comments
We should define "video conference". It's sole mention is "EXAMPLE 6: Using video conferencing actions:" (added in #269). There's no mention of this category of actions' existence in the README, explainer, the spec itself, or use cases. Nor do their existence flow naturally from those narratives, which center on control over consumption of "media" (with metadata such as title, artist, album and album art). I think the best I can say is that extending "media keys" to control media the user also produces makes some sense. I would expect the spec and explainer to start with the specific use cases to be solved, and define the concepts needed to help implementers route actions to the correct place. This would also help the WG define better criteria for what's in or out. On conceptsThe assumption seems to be that an active media session may be a video conferencing session. That seems a bit of a stretch already (who's the artist?), but for most of us (and let's face it most meetings), treating the corporate online video meeting and AURORA - Runaway the same seems appealing, at least for the consumption part. Except, it's not. If I "pause" AURORA, she picks up singing where she left off. I can't "pause" my boss. I miss out. I can't pause, seek and skip ads in a live video conference. Users might wish to "mute" from their lock screen, but there's no media session action for that. "Pause" is not "mute". A true "pause" would start buffering, taking the viewer out of "realtime" mode and into some semi-live streaming session. So the overlap seems poor. Maybe things are either a media session or a video conferencing session? Then again, today's apps are neatly one or the other (media streaming vs webrtc), but tomorrow's apps may blur this distinction more. Also, sometimes there may be several sessions in play: If I'm presenting and playing a youtube video to the audience, then there are at least two media sessions:
In this example, it would make sense for pause, seek and skipad to go to 1, whereas togglecamera/mic and hangup to 2. To add to the complexity, more than one web origin may be involved in presentations (but not necessarily). |
Interestingly, if we add "nextslide" and "previousslide" to this #274, they should go to 1. So I suspect there may be 3 sessions here:
This would allow for presentation controls even for presentations done the old fashioned way (no video conference). |
Having a concept of a video conference session and allowing a web application to declare itself as having a video conference session might have some benefit. One use case I see is that on iOS we could set the AudioSession PlayAndRecord category when web application declares having a VC session. Currently on WebKit, this is done when microphone capture starts, which has some drawbacks (system audio level might change while remote audio is already rendering).
It is interesting to think both in terms of keyboard and PiP window UI. |
It does seem that adding a video conference session concept would be useful, which could then be used to clarify the routing definition in the spec - even if we don't end up adding previous/next slide actions to MediaSession. |
The current spec conflates routing with API: "The user agent MUST select at most one of the MediaSession objects to present to the user, which is called the active media session." By this logic, adding an "active video conferencing session" implies we add a new But this would also cause backwards compatibility issues since it would mean In theory at least, we could maybe solve routing and API separately, if we wanted to keep everything under I don't have an opinion yet, just wanted to enumerate the options I see. |
From yesterday's meeting:
I looked, and the spec doesn't seem to allow this. The handle media session action steps only say to: "Run the activation notification steps in the browsing context associated with the active media session." If this discrepancy is limited to If this discrepancy is not limited to those, we should capture that in the spec as well somehow. |
I think that the solution here is to require that togglemicrophone, togglecamera and togglescreenshare have a target. Ditto probably for hangup. In that case, selection of the session will always be target. For capture actions, we can piggy back on documents whose mediaDevices have relevant media sources. I am less sure about hangup. |
In section 5.2 Routing, the spec currently defines active media session as something that is up to the user agent to select, and recommended to be selected based on audio focus, i.e., if it is currently playing audio or the user expects to control the media in it.
From discussion in the 23 May 2022 Media WG / WebRTC WG Joint meeting, what does this mean in relation to the
togglemicrophone
,togglecamera
, andhangup
actions. Do the same routing criteria apply to these actions? Is any change to the definition of active media session needed?The text was updated successfully, but these errors were encountered: