Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of active media session and the togglemicrophone, togglecamera, hangup actions #278

Open
chrisn opened this issue May 23, 2022 · 8 comments
Labels
editorial mediacontrol P1 TPAC2024 Topic for discussion at TPAC 2024
Milestone

Comments

@chrisn
Copy link
Member

chrisn commented May 23, 2022

In section 5.2 Routing, the spec currently defines active media session as something that is up to the user agent to select, and recommended to be selected based on audio focus, i.e., if it is currently playing audio or the user expects to control the media in it.

From discussion in the 23 May 2022 Media WG / WebRTC WG Joint meeting, what does this mean in relation to the togglemicrophone, togglecamera, and hangup actions. Do the same routing criteria apply to these actions? Is any change to the definition of active media session needed?

@jan-ivar
Copy link
Member

We should define "video conference". It's sole mention is "EXAMPLE 6: Using video conferencing actions:" (added in #269).

There's no mention of this category of actions' existence in the README, explainer, the spec itself, or use cases. Nor do their existence flow naturally from those narratives, which center on control over consumption of "media" (with metadata such as title, artist, album and album art).

I think the best I can say is that extending "media keys" to control media the user also produces makes some sense.

I would expect the spec and explainer to start with the specific use cases to be solved, and define the concepts needed to help implementers route actions to the correct place. This would also help the WG define better criteria for what's in or out.

On concepts

The assumption seems to be that an active media session may be a video conferencing session. That seems a bit of a stretch already (who's the artist?), but for most of us (and let's face it most meetings), treating the corporate online video meeting and AURORA - Runaway the same seems appealing, at least for the consumption part.

Except, it's not.

If I "pause" AURORA, she picks up singing where she left off. I can't "pause" my boss. I miss out.

I can't pause, seek and skip ads in a live video conference. Users might wish to "mute" from their lock screen, but there's no media session action for that. "Pause" is not "mute". A true "pause" would start buffering, taking the viewer out of "realtime" mode and into some semi-live streaming session.

So the overlap seems poor. Maybe things are either a media session or a video conferencing session? Then again, today's apps are neatly one or the other (media streaming vs webrtc), but tomorrow's apps may blur this distinction more.

Also, sometimes there may be several sessions in play: If I'm presenting and playing a youtube video to the audience, then there are at least two media sessions:

  1. the youtube video I'm presenting, and
  2. the video conference session

In this example, it would make sense for pause, seek and skipad to go to 1, whereas togglecamera/mic and hangup to 2.

To add to the complexity, more than one web origin may be involved in presentations (but not necessarily).

@jan-ivar
Copy link
Member

jan-ivar commented May 23, 2022

In this example, it would make sense for pause, seek and skipad to go to 1, whereas togglecamera/mic and hangup to 2.

Interestingly, if we add "nextslide" and "previousslide" to this #274, they should go to 1.

So I suspect there may be 3 sessions here:

  • active media consumption session (1)
  • active video conferencing session (2)
  • active presentation session (1)

This would allow for presentation controls even for presentations done the old fashioned way (no video conference).

@youennf
Copy link
Contributor

youennf commented May 24, 2022

Having a concept of a video conference session and allowing a web application to declare itself as having a video conference session might have some benefit.

One use case I see is that on iOS we could set the AudioSession PlayAndRecord category when web application declares having a VC session. Currently on WebKit, this is done when microphone capture starts, which has some drawbacks (system audio level might change while remote audio is already rendering).

In this example, it would make sense for pause, seek and skipad to go to 1, whereas togglecamera/mic and hangup to 2.

To add to the complexity, more than one web origin may be involved in presentations (but not necessarily).

It is interesting to think both in terms of keyboard and PiP window UI.
Automatic UA routing is indeed one possibility we should explore and seems to work great for keyboard.
Another approach is to let sessions provide routing information to the UA, for instance by declaratively telling UA to forward specific actions to the capturee. This seems well suited in case the PiP window UI is customised according which actions are registered.

@chrisn
Copy link
Member Author

chrisn commented May 30, 2022

There's no mention of this category of actions' existence in the README, explainer, the spec itself, or use cases.

I have filed #281 to track updating the explainer.

@chrisn
Copy link
Member Author

chrisn commented May 30, 2022

It does seem that adding a video conference session concept would be useful, which could then be used to clarify the routing definition in the spec - even if we don't end up adding previous/next slide actions to MediaSession.

@jan-ivar
Copy link
Member

The current spec conflates routing with API: "The user agent MUST select at most one of the MediaSession objects to present to the user, which is called the active media session."

By this logic, adding an "active video conferencing session" implies we add a new VideoConferencingSession API #282, which has some advantages (e.g. no artist).

But this would also cause backwards compatibility issues since it would mean togglemicrophone, togglecamera, and hangup are on the wrong API today.

In theory at least, we could maybe solve routing and API separately, if we wanted to keep everything under MediaSession. I.e. the "active video conferencing session" and "active media session" could point to different MediaSession objects.

I don't have an opinion yet, just wanted to enumerate the options I see.

@jan-ivar
Copy link
Member

From yesterday's meeting:

We're already doing it in Chrome when we send actions to media sessions that are not the current active one.

I looked, and the spec doesn't seem to allow this. The handle media session action steps only say to: "Run the activation notification steps in the browsing context associated with the active media session."

If this discrepancy is limited totogglemicrophone, togglecamera and hangup, then (conservatively) defining a new "active media capture session" might be the answer. This could still be a MediaSession to decouple discussion of routing from API. This session might be guided by microphone focus instead of audio focus.

If this discrepancy is not limited to those, we should capture that in the spec as well somehow.

@youennf
Copy link
Contributor

youennf commented Sep 27, 2024

I think that the solution here is to require that togglemicrophone, togglecamera and togglescreenshare have a target. Ditto probably for hangup.

In that case, selection of the session will always be target.

For capture actions, we can piggy back on documents whose mediaDevices have relevant media sources.

I am less sure about hangup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial mediacontrol P1 TPAC2024 Topic for discussion at TPAC 2024
Projects
None yet
Development

No branches or pull requests

4 participants