Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need Group Order for Subscriptions ? #607

Open
suhasHere opened this issue Nov 6, 2024 · 20 comments · May be fixed by #636
Open

Do we need Group Order for Subscriptions ? #607

suhasHere opened this issue Nov 6, 2024 · 20 comments · May be fixed by #636
Assignees
Labels
Needs PR Subscribe Related to SUBSCRIBE message and subscription handling

Comments

@suhasHere
Copy link
Collaborator

This came during priority discussions at IETF 121 and was wondering what purpose that group order solves , especially when we have delivery timeout that causes things to expire under sustained congestion . Sustained congestion was one reason for why groupOrder was proposed IIRC ? May be I am wrong here.

Can someone provide more context ?

@afrind
Copy link
Collaborator

afrind commented Nov 6, 2024

To clarify - is your proposal to remove groupOrder from SUBSCRIBE and SUBSCRIBE_OK, and always use Ascending order to break priority ties (eg: to choose between two objects from different groups in the same track and priority level)?

My recollection the order=DSC was proposed for realtime use cases before we added delivery timeout (eg: if data from a new group arrives, send that and starve older data). If Ascending+Delivery Timeout is the preferred mechanism for realtime cases, maybe we can remove it.

@martinduke
Copy link
Contributor

IIRC Descending order is also for rewind, which is now a FETCH case.

@suhasHere
Copy link
Collaborator Author

Ascending+Delivery Timeout should addresses the temporary congestion flows

@ianswett
Copy link
Collaborator

I went through all my ideas for when I might use descending and I don't think any of them are compelling anymore.

One use case for descending was to ensure you always got the most recent data, but now that priorities are clarified (#518), I think the correct thing will happen with ascending and delivery timeout. You'll always get the base layer, then as much of the enhancement layers as bandwidth allows. You wouldn't want to skip over the last object or two of the previous group to start sending the first object(s) of the next group.

IIRC Descending order is also for rewind, which is now a FETCH case.

Can you elaborate more on what you mean by rewind?

@ianswett ianswett added Subscribe Related to SUBSCRIBE message and subscription handling Needs Discussion Tags for issues which need discussion, ideally at an interim or IETF meeting. labels Nov 10, 2024
@gmarzot
Copy link

gmarzot commented Nov 10, 2024

likely referring to player action of moving backward on the timeline. aka scrubbing , in this case scrubbing backwards. FETCH seems the right place for that IMO. deference to the issue author for a definitive response.

@englishm
Copy link
Collaborator

englishm commented Nov 11, 2024

You wouldn't want to skip over the last object or two of the previous group to start sending the first object(s) of the next group.

This is behavior that has been explicitly described as desirable in the past. Prioritization equivalent to descending group order was present even in the earliest WARP drafts because Twitch's player preferred to optimize in favor of low latency by skipping ahead and dropping frames at the tail of a GoP, providing a way for clients to catch-up to as close to the live edge as possible.

This tail drop behavior isn't desirable for all use cases, but assuming we still want to support this Twitch-like skip-to-live behavior, I don't see how we can do it with priorities and timeouts alone.

SubGroup priorities are still bounded to groups and publisher/subscriber priorities apply to the track level.

Due to temporary congestion on the last mile hop, a relay may have objects ready to send:

  • {Group: 3, SubGroup: 0, Object: 58}
  • {Group: 3, SubGroup: 0, Object: 59}
  • {Group: 3, SubGroup: 0, Object: 60}
  • {Group: 4, SubGroup: 0, Object: 0}

If we can apply group order descending, then that relay will put {Group: 4, SubGroup: 0, Object: 0} on the wire first and the player can skip ahead, catching up as close as possible to the live edge. Then the relay sends {Group: 3, SubGroup: 0, Object: 58}, and possibly also objects 59 and 60 if the delivery timeouts permit and the subscriber has not sent QUIC STOP_SENDING for the stream carrying objects of {Group: 3, SubGroup: 0}. It's possible that more objects from Group 4 may be ready to send before the remaining objects from Group 3 are sent, so they may be interleaved or eventually starved off up to a delivery timeout or QUIC STOP_SENDING from the subscriber. Importantly, if the temporary congestion clears and there's room to send those objects in time, they can still be sent.

If we were to try to force the relay to send {Group: 4, SubGroup: 0, Object: 0} using only priorities and timeouts, we would need to set a delivery timeout for the subscription short enough to expire {Group: 3, SubGroup: 0, Object: 58}, {Group: 3, SubGroup: 0, Object: 59}, and {Group: 3, SubGroup: 0, Object: 60} as soon as {Group: 4, SubGroup: 0, Object: 0} is available.

If our objects are frames being produced approximately every 33ms, we could set a very tight delivery timeout matching that rate and get {Group: 4, SubGroup: 0, Object: 0} to be sent in this scenario rather than any more objects from Group 3, but we would then preclude the possibility of ever sending the remaining objects from Group 3 when conditions improve.

I think delivery timeouts are more useful as a backstop to prevent unnecessary queuing when congestion is present for more prolonged periods of time. Maybe our application has a buffer a handful of frames in length to account for brief periods of congestion, but eventually we do want to skip ahead to a new group. The delivery timeout might be something like 165ms, but we'd still want group order descending to make sure that a keyframe from a new group is sent as soon as it's available.

If we instead had a very tight delivery timeout, we'd risk creating worse problems because we'd end up more likely to expire the heads (object 0) of new groups when encountering congestion, essentially starving end subscribers of objects that can be decoded and played back.

It's a subtle difference, but I do think there's some meaningful behavior we would be precluding by dropping group order descending with our current priority scheme.

@englishm
Copy link
Collaborator

Seeing #606, maybe a different understanding of how delivery timeout works motivated this issue?

@fluffy
Copy link
Contributor

fluffy commented Nov 12, 2024

So clearly we need order for fetch, but for subscribe, it has never made much sense to me. So few things

First thing: if some people want this, I happy to keep it even thought I don't think it will be used because I don't think it adds much complexity.

That said .... First keep in mind that is the order is descending, but there is not congestion, all the data will arrive in ascending order. I've played with implementations of this and what happens is if the subscriber is operating at near the limit of bandwidth available ( which is the optimal situations ) you end up with the last frame of group X nearly always gets delayed until after the first frame (or more) of group X+1. This is because the first frame of the group is typically larger than other frames. This results in the last frame of group X having huge jitter. The jitter buffer compensates for this by uping the depth of the jitter buffer.

So in summary, in playing with implementations of this, I find that descending increases jitter and latency over ascending but does not increase quality of overall user experience.

We clearly need order for Fetch, but for subscribe, I don't think we need order, but if other people want it, I am perfectly fine with keeping it.

@ianswett
Copy link
Collaborator

@fluffy I thought we needed GroupOrder for FETCH as well, but I haven't been able to come up with something that justifies it at the moment. Can you provide a motivating example? If we could get rid of it from both, that would simplify things more.

@suhasHere
Copy link
Collaborator Author

+1 to @ianswett on the signifiance of group order for fetch flows.

I tried to write to few use-cases even for Fetch, but couldn't convince myself satisfactorily.

Given fetch is on a single stream, only plausible implementation i can see is something like below

  • player fetches 10 groups in DSC order , each of 5 seconds ( 50 seconds of buffer)
  • then as the groups start coming in, the player may choose to start showing recent 3 groups , say 8, 9, 10 from its buffer
  • then it cancels the fetch as the player decided to move ahead

but this is a concocted example and a better implementation will fetch in smaller ranges instead.

May be I am missing something here. Happy to be convinced otherwise,

@afrind
Copy link
Collaborator

afrind commented Nov 13, 2024

Individual Comment:

I think your example for descending FETCH is right Suhas. The question is where do we want the loop -- at the publisher or the subscriber? It doesn't seem terribly complicated to have it at the publisher, and it saves the subscriber making multiple requests, which might incur additional unnecessary latency and processing overhead.

@ianswett
Copy link
Collaborator

Agreed, well written @suhasHere

But I think the example isn't how one actually implements a player. A player that wants a certain jitter buffer for 'low latency live', ie: sports, typically has a very specific target of 5 or 10 or X seconds/GOPs to start. Then if the connection turns out to be unreliable and a rebuffer occurs, it will commonly bump up the target jjitter buffer size. As such, I think a player will prioritize the FETCH higher than the SUBSCRIBE and likely start playing well before the FETCH completes and it hits the target jitter buffer, because join latency matters as well.

@suhasHere
Copy link
Collaborator Author

Individual Comment:

I think your example for descending FETCH is right Suhas. The question is where do we want the loop -- at the publisher or the subscriber? It doesn't seem terribly complicated to have it at the publisher, and it saves the subscriber making multiple requests, which might incur additional unnecessary latency and processing overhead.

my thinking around this is player is in the control of its buffer depth, experiences it wants to show under different circumstances .. Thus i lean towards client side rather than server side.

@gwendalsimon
Copy link

gwendalsimon commented Nov 13, 2024

The motivation for Descending order in FETCH is when seeking backwards for visual scrubbing. Scrubbing (quickly navigate through a video by dragging a slider or timeline bar back and forth) with sub-clip retrieval are an important use-case in live TV streaming. In Will's slides of the Boston interim meeting, it was Example # 4.

@wilaw
Copy link
Contributor

wilaw commented Nov 13, 2024

+1 to the FETCH use case of visual scrubbing in DESC order, especially when combined with a filter to only pull objects containing I-frames.

While we definitely need GROUP ORDER for FETCH, I don't think we need GROUP order for SUBSCRIBE. @englishm gives the best motivation for why using delivery timeouts alone would be difficult to produce the head-of-line skipping for newer content. However even Luke, who was the originator of this approach, backed off citing poor performance when testing. An alternate strategy using priorities would be to put all the I-Frames (Object 0) in their own subgroup with a high priority and then the p-frames in a separate subgroup with a lower priority. This would mean that under congestion, your I-Frames are delivered ahead of your p-frames, which is the skip-ahead visual behavior that you want. I'd prefer to remove GROUP order for SUBSCRIBE now and then add it back in later if experimental evidence shows that it really is the path to better real-time performance under congestion.

@TimEvens
Copy link

+1 to remove any ordering relating to subscribe. Fetch is a different use-case. Subscribes receive data in the order that the publisher publishes them. If the order violates the protocol spec, it's the job of the ingress relay to enforce that. If the order passes the ingress relay, then the order is maintained to all active subscribers, which includes via interconnected relays. If we have to support anything else, it will result in playback from cache for subscribes, which is overly complicated and will introduced delays with real-time delivery.

@afrind afrind added Needs PR and removed Needs Discussion Tags for issues which need discussion, ideally at an interim or IETF meeting. labels Nov 13, 2024
@afrind
Copy link
Collaborator

afrind commented Nov 13, 2024

Chair Comment:

Seems like most are in favor of removing Descending group order from SUBSCRIBE and this is ready for a PR as such. It sounds like several folks have use cases for Descending order FETCH, so let's leave that in for now, or open a separate issue if someone feel strongly it needs to be removed now.

@vasilvv
Copy link
Collaborator

vasilvv commented Nov 13, 2024

I am skeptical of the removal. I agree with Cullen that DESC order is likely not useful in real-time use cases due to the "tail deprioritization" effects, but it's not immediately clear to me that this applies to scenarios with bigger buffers. In simulations I ran for live use cases, DESC without timeouts was clearly advantageous against ASC without timeouts, so there's that.

If the order passes the ingress relay, then the order is maintained to all active subscribers, which includes via interconnected relays

I don't believe this is true: groups are delivered independently, and the network provides no ordering guarantees.

@suhasHere
Copy link
Collaborator Author

+1 on removing ASC for Subscribe/SubscribeOk and leaving it for Fetch if it addresses real use-cases or revisiting it later.

In simulations I ran for live use cases, DESC without timeouts was clearly advantageous against ASC without timeouts, so there's that.

@vasilvv my tests with jitter buffer of size 5 seconds, ASC with timeout had better experiences especially when link was temporarily congested. I haven't tried without timeouts though. Can you share more about your experiments. Thanks

@TimEvens
Copy link

TimEvens commented Nov 14, 2024

@vasilvv ,

I don't believe this is true: groups are delivered independently, and the network provides no ordering guarantees.

we are agreeing on the same thing. Interconnecting relays via the network does not provide ordering… hence it’s the order received to order transmitted… based on the original publish transmission (aka order). In other words, as received will be delivered in that order. The relays should not be changing order to ensure asc or desc for live/real time subscribes. Again, fetch is an exception. Just my two cents…

@suhasHere suhasHere linked a pull request Dec 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs PR Subscribe Related to SUBSCRIBE message and subscription handling
Projects
None yet
Development

Successfully merging a pull request may close this issue.