Definition of active media session and the togglemicrophone, togglecamera, hangup actions #278

chrisn · 2022-05-23T19:19:52Z

In section 5.2 Routing, the spec currently defines active media session as something that is up to the user agent to select, and recommended to be selected based on audio focus, i.e., if it is currently playing audio or the user expects to control the media in it.

From discussion in the 23 May 2022 Media WG / WebRTC WG Joint meeting, what does this mean in relation to the togglemicrophone, togglecamera, and hangup actions. Do the same routing criteria apply to these actions? Is any change to the definition of active media session needed?

The text was updated successfully, but these errors were encountered:

jan-ivar · 2022-05-23T20:32:36Z

We should define "video conference". It's sole mention is "EXAMPLE 6: Using video conferencing actions:" (added in #269).

There's no mention of this category of actions' existence in the README, explainer, the spec itself, or use cases. Nor do their existence flow naturally from those narratives, which center on control over consumption of "media" (with metadata such as title, artist, album and album art).

I think the best I can say is that extending "media keys" to control media the user also produces makes some sense.

I would expect the spec and explainer to start with the specific use cases to be solved, and define the concepts needed to help implementers route actions to the correct place. This would also help the WG define better criteria for what's in or out.

On concepts

The assumption seems to be that an active media session may be a video conferencing session. That seems a bit of a stretch already (who's the artist?), but for most of us (and let's face it most meetings), treating the corporate online video meeting and AURORA - Runaway the same seems appealing, at least for the consumption part.

Except, it's not.

If I "pause" AURORA, she picks up singing where she left off. I can't "pause" my boss. I miss out.

I can't pause, seek and skip ads in a live video conference. Users might wish to "mute" from their lock screen, but there's no media session action for that. "Pause" is not "mute". A true "pause" would start buffering, taking the viewer out of "realtime" mode and into some semi-live streaming session.

So the overlap seems poor. Maybe things are either a media session or a video conferencing session? Then again, today's apps are neatly one or the other (media streaming vs webrtc), but tomorrow's apps may blur this distinction more.

Also, sometimes there may be several sessions in play: If I'm presenting and playing a youtube video to the audience, then there are at least two media sessions:

the youtube video I'm presenting, and
the video conference session

In this example, it would make sense for pause, seek and skipad to go to 1, whereas togglecamera/mic and hangup to 2.

To add to the complexity, more than one web origin may be involved in presentations (but not necessarily).

jan-ivar · 2022-05-23T20:52:15Z

In this example, it would make sense for pause, seek and skipad to go to 1, whereas togglecamera/mic and hangup to 2.

Interestingly, if we add "nextslide" and "previousslide" to this #274, they should go to 1.

So I suspect there may be 3 sessions here:

active media consumption session (1)
active video conferencing session (2)
active presentation session (1)

This would allow for presentation controls even for presentations done the old fashioned way (no video conference).

youennf · 2022-05-24T10:06:52Z

Having a concept of a video conference session and allowing a web application to declare itself as having a video conference session might have some benefit.

One use case I see is that on iOS we could set the AudioSession PlayAndRecord category when web application declares having a VC session. Currently on WebKit, this is done when microphone capture starts, which has some drawbacks (system audio level might change while remote audio is already rendering).

In this example, it would make sense for pause, seek and skipad to go to 1, whereas togglecamera/mic and hangup to 2.

To add to the complexity, more than one web origin may be involved in presentations (but not necessarily).

It is interesting to think both in terms of keyboard and PiP window UI.
Automatic UA routing is indeed one possibility we should explore and seems to work great for keyboard.
Another approach is to let sessions provide routing information to the UA, for instance by declaratively telling UA to forward specific actions to the capturee. This seems well suited in case the PiP window UI is customised according which actions are registered.

chrisn · 2022-05-30T17:19:01Z

There's no mention of this category of actions' existence in the README, explainer, the spec itself, or use cases.

I have filed #281 to track updating the explainer.

chrisn · 2022-05-30T17:36:37Z

It does seem that adding a video conference session concept would be useful, which could then be used to clarify the routing definition in the spec - even if we don't end up adding previous/next slide actions to MediaSession.

jan-ivar · 2022-07-11T20:22:57Z

The current spec conflates routing with API: "The user agent MUST select at most one of the MediaSession objects to present to the user, which is called the active media session."

By this logic, adding an "active video conferencing session" implies we add a new VideoConferencingSession API #282, which has some advantages (e.g. no artist).

But this would also cause backwards compatibility issues since it would mean togglemicrophone, togglecamera, and hangup are on the wrong API today.

In theory at least, we could maybe solve routing and API separately, if we wanted to keep everything under MediaSession. I.e. the "active video conferencing session" and "active media session" could point to different MediaSession objects.

I don't have an opinion yet, just wanted to enumerate the options I see.

jan-ivar · 2022-07-13T15:53:56Z

From yesterday's meeting:

We're already doing it in Chrome when we send actions to media sessions that are not the current active one.

I looked, and the spec doesn't seem to allow this. The handle media session action steps only say to: "Run the activation notification steps in the browsing context associated with the active media session."

If this discrepancy is limited totogglemicrophone, togglecamera and hangup, then (conservatively) defining a new "active media capture session" might be the answer. This could still be a MediaSession to decouple discussion of routing from API. This session might be guided by microphone focus instead of audio focus.

If this discrepancy is not limited to those, we should capture that in the spec as well somehow.

youennf · 2024-09-27T00:45:54Z

I think that the solution here is to require that togglemicrophone, togglecamera and togglescreenshare have a target. Ditto probably for hangup.

In that case, selection of the session will always be target.

For capture actions, we can piggy back on documents whose mediaDevices have relevant media sources.

I am less sure about hangup.

jan-ivar mentioned this issue Jul 11, 2022

Dedicated video conference session API? #282

Open

jan-ivar mentioned this issue Jul 13, 2022

Should we add slide presentation specific actions? #274

Closed

steimelchrome added P1 mediacontrol editorial labels Mar 3, 2023

youennf added this to the V1 milestone Mar 14, 2023

chrisn mentioned this issue Aug 10, 2023

Top issues for TPAC 2023 #297

Closed

youennf mentioned this issue Jan 30, 2024

Avoid circular definition of muted. w3c/mediacapture-main#982

Open

chrisn added the TPAC2024 Topic for discussion at TPAC 2024 label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Definition of active media session and the togglemicrophone, togglecamera, hangup actions #278

Definition of active media session and the togglemicrophone, togglecamera, hangup actions #278

chrisn commented May 23, 2022

jan-ivar commented May 23, 2022

jan-ivar commented May 23, 2022 •

edited

Loading

youennf commented May 24, 2022

chrisn commented May 30, 2022

chrisn commented May 30, 2022

jan-ivar commented Jul 11, 2022

jan-ivar commented Jul 13, 2022

youennf commented Sep 27, 2024

Definition of active media session and the togglemicrophone, togglecamera, hangup actions #278

Definition of active media session and the togglemicrophone, togglecamera, hangup actions #278

Comments

chrisn commented May 23, 2022

jan-ivar commented May 23, 2022

On concepts

jan-ivar commented May 23, 2022 • edited Loading

youennf commented May 24, 2022

chrisn commented May 30, 2022

chrisn commented May 30, 2022

jan-ivar commented Jul 11, 2022

jan-ivar commented Jul 13, 2022

youennf commented Sep 27, 2024

jan-ivar commented May 23, 2022 •

edited

Loading