Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUBSCRIBE REQUEST ambiguity #204

Closed
afrind opened this issue May 30, 2023 · 18 comments · Fixed by #277
Closed

SUBSCRIBE REQUEST ambiguity #204

afrind opened this issue May 30, 2023 · 18 comments · Fixed by #277
Assignees
Labels
Needs PR Subscribe Related to SUBSCRIBE message and subscription handling

Comments

@afrind
Copy link
Collaborator

afrind commented May 30, 2023

Assume a setup with two publishers, a single relay and one subscriber.

P1 ANNOUNCE's track namespace fooba
P2 ANNOUNCE's track namespace foo

S1 sends a SUBSCRIBE REQUEST for Full Track Name foobar

How does the relay know whether foobar is Track Namespace: foo and Track Name: bar, or Track Namespace: fooba and Track Name r, and hence which publisher to route the subscribe to?

I see a few possibilities:

  1. The relay should not allow this scenario (via namespace construction and/or authorization). If so the draft should include text explaining that this is disallowed (normatively?) or "bad things(tm)" happen.

  2. Change SUBSCRIBE REQUEST to be able to disambiguate the Track Namespace from the Track Name. This could be done by sending name and namespace in separate message fields, or having stricter rules for the structure of Full Track Name such that a relay can extract the constituent parts.

  3. Explicitly state that Full Track Names are matched to namespaces via "longest prefix match".

@VMatrix1900
Copy link

Option 3 looks interesting. P1 can use a longer track namespace to steer the sub request from P2. Like a traffic engeering for content.

@kixelated
Copy link
Collaborator

kixelated commented Jun 9, 2023

I like option 2.

I don't agree with treating track namespace as merely a string prefix. Even if you go with option 3, there's still a race condition where a late ANNOUNCE will cause a SUBSCRIBE to an unintended track.

I want to give the track namespace more properties so it's treated more like a broadcast identifier. A few folks are headed down this path without realizing it, such as proposing that there's a single catalog per track namespace. I don't see how that would even work if ANNOUNCE is used purely for routing like a BGP announcement.

@afrind
Copy link
Collaborator Author

afrind commented Jun 9, 2023

@suhasHere : are you saying you believe that the relay resolves the described ambiguity by using 'longest prefix' matched (eg: option 3)?

all the Relays needs to do is to check if there is one or more matches and send it to the subscribers.

Another option I hadn't considered based on this comment is that the relay would send OBJECTs from BOTH publishers since they both match the Full Track Name. This will break in the current design, since each publisher has their own Object Sequence and Group Sequence and when merged the consumer will be left with garbage. It also violates the logic described that says a Full Track Name can be used as a cache key.

@kixelated
Copy link
Collaborator

kixelated commented Jun 10, 2023

Yeah that seems quite problematic @afrind.

Let's say a relay have two connected publishers who send IN ORDER:

publisher1: ANNOUNCE webex.com
publisher2: ANNOUNCE webex.com/meeting123

A subscriber connects and issues:

SUBSCRIBE webex.com/meeting123/john-cena

What happens next?

  1. The relay forwards the SUBSCRIBE to publisher 1.
  2. The relay returns SUBSCRIBE ERROR not_found
  3. The relay forwards the SUBSCRIBE to publisher 2.
  4. The relay forwards the SUBSCRIBE to publisher 1&2.

These are in the order of Alan's options. I added option 4 based on Suhas' comment.

  1. Would occur if the relay rejects publisher2's ANNOUNCE because of thr shared prefix.
  2. Would occur because the track namespace is an ID, not a prefix, and must fully match an ANNOUNCE.
  3. Would occur because publisher2 has the longer path prefix.
  4. Would occur because both publishers could produce the requested track.

And to throw a wrench in things, a third publisher joins.

publisher3: ANNOUNCE webex.com/meeting123/john-cena

@afrind
Copy link
Collaborator Author

afrind commented Jun 10, 2023

There's also the case where the relay replies to the subscribe requests on behalf of the publisher(s), so it doesn't have to make the routing decision at subscription time. But it does need to decide where to forward OBJECTs when they arrive.

@kixelated
Copy link
Collaborator

kixelated commented Jun 14, 2023

Implementing this now so I have some more thoughts.

We should explicitly support UNANNOUNCE and UNSUBSCRIBE, which implicitly occur when the connection is dropped.

Option 1

namespaces: RadixTree<String, Namespace>
tracks: RadixTree<String, Track> 
  • On ANNOUNCE, the relay MUST ensure that the namespace not a subset/superset of an existing ANNOUNCE.
  • On SUBSCRIBE, the relay looks up the track based on the prefix and appends itself to a list of subscribers. If not found, it looks up the namespace and creates a track entry if valid.
  • On UNSUBSCRIBE, the relay looks up the track based on the full name and removes itself from a list of subscribers.
  • On UNANNOUNCE, the relay deletes any namespaces and tracks matching the prefix.

A radix tree is used for both so the relay can iterate through any values that match the given prefix. This is a global radix tree, similar to how QUIC connections are identified when variable length connection IDs are supported.

Option 2

upstreams: HashMap<String, Namespace>
tracks: HashMap<Namespace, Track>
  • On ANNOUNCE, the relay inserts the upstream into the upstreams hash map and errors on a duplicate.
  • On SUBSCRIBE, the relay looks up the namespace and then the track, creating an entry if not found.
  • On UNSUBSCRIBE, the relay looks up the namespace and then the track, deleting an entry if found.
  • On UNANNOUNCE, the relay deletes from the namespace and track hash map.

This is a strictly better than option 1. It provides the same functionality but faster, and without the restriction on namespace prefixes.

Option 3

Runs into issues with ANNOUNCE. example:

publisher1: ANNOUNCE  webex.com
consumer1:  SUBSCRIBE webex.com/meeting123/hd
publisher2: ANNOUNCE  webex.com/meeting123
consumer2:  SUBSCRIBE webex.com/meeting123/hd

consumer1 and consumer2 are subscribed to different publishers, despite using THE SAME NAME. This invalidates the claim that the track name is the cache key. We can't migrate consumer1 to the new longer prefix from publisher2 unless there's a guarantee they are producing the same content.

A similar issue occurs when UNANNOUNCE is fired, when happens implicitly when the connection is dropped:

publisher1: ANNOUNCE  webex.com
publisher2: ANNOUNCE  webex.com/meeting123
consumer1:  SUBSCRIBE webex.com/meeting123/hd
publisher2: UNANNOUNCE webex.com/meeting123

Each subscription needs to keep reference to the track when the SUBSCRIBE was received, instead of doing a lookup based on full track name when each OBJECT is received. This will cause havok when resubscriptions occur, for example with ABR, as switching away from and back to a track MAY result in different content.

Option 4

It's not viable. The subscription would intermingle objects from separate publishers that happen share a prefix, all while using the same track_id. I really don't want to support subscribing based on wildcards or prefixes anyway.

This is also the slowest option, as there needs to be a global RadixTree that maps tracks to subscriptions on receipt of each object. All other options listed above would keep an array of subscribers per track, resolved using a (shardable) HashMap.

@suhasHere
Copy link
Collaborator

suhasHere commented Jul 24, 2023

  1. The relay should not allow this scenario (via namespace construction and/or authorization). If so the draft should include text explaining that this is disallowed (normatively?) or "bad things(tm)" happen.

I think this seems like the right approach. Having 2 producers with conflicting namespaces is a bad idea, unless the authz policy and application is somehow OK with the consequences.

I have added #225 to clarify the thinking further

@fluffy
Copy link
Contributor

fluffy commented Jul 24, 2023

I'm of the view of two things 1) apps should not do this if it is a problem for that app usage of relay 2) relay should use the latest "announce" to override any earlier announces that are more specific ( or the same )

@afrind
Copy link
Collaborator Author

afrind commented Jul 27, 2023

Discussed at IETF 117:

Support in the room for changing "Full Track Name" from a single string to a tuple of Namespace and Track Name, and matching on Namespace exactly.

@suhasHere
Copy link
Collaborator

suhasHere commented Jul 27, 2023

IIRC, my recollection was not the same, also I don't think we concluded that the namespace match exactly. The discussion went over what the authz model defines and how that impacts the flow.
I wonder how much the transport draft can mandate how a given relay application should implement the business rules. The transport draft must just indicate relay behavior and the course of actions to take for what happens when " if match succeeds vs match fails ", which is what is in the draft today already

@fluffy
Copy link
Contributor

fluffy commented Jul 27, 2023

Uh, my understanding of what we talking about what not at all removing Full Track Name - it was about we did not need to define a separator for concatenating things in a tuple. I don't agree there was support for chaning Full Track Name to Namesapce and Track Name tuple and only doing full matches. That is not going to work for many of the use cases and not what we are discussing.

@fluffy
Copy link
Contributor

fluffy commented Jul 27, 2023

I thought what came out of this discussion is

  1. we do not know how duplicate ANNOUNCE work even when the names are identical ( do we keep first, last , error etc )

  2. once we figure that out, we can start to deal with the more complicated issue of partial match or overlapping

Perhaps I have this recollection wrong but if so point me at right place in recording for that.

@kixelated
Copy link
Collaborator

kixelated commented Jul 27, 2023

My motivation for ANNOUNCE was for the contribution case. A client like OBS will connect to the server, ANNOUNCE a new catalog track, and then the server will issue a corresponding SUBSCRIBE to get the data flowing.

-> ANNOUNCE webex.com/meeting123/suhas
<- SUBSCRIBE webex.com/meeting123/suhas/catalog

This is modeled after the QUICR PUBLISH_INTENT message, except that data doesn't flow until the other side issues a SUBSCRIBE.

However, announcing a namespace prefix subverts this behavior:

-> ANNOUNCE webex.com
<- SUBSCRIBE webex.com/catalog (ERROR)

The relay doesn't know if an ANNOUNCE is a prefix for routing, or a single broadcast being published. The relay effectively needs a business agreement with each client to determine.

I think we should split this ANNOUNCE functionality into separate messages if just for the sake of discussion:

  1. PLS_SUBSCRIBE namespace track
  2. PLS_ROUTE namespace_prefix

@kixelated
Copy link
Collaborator

kixelated commented Jul 27, 2023

In the case of PLS_SUBSCRIBE, a tuple makes perfect sense as it mirrors the arguments to SUBSCRIBE. This message would be primarily used for contribution to "push" a new broadcast. It also makes sense as a reconnect signal where the last message wins.

In the case of PLS_ROUTE, matching the longest prefix might make sense instead. I know very little about BGP but maybe there's some inspiration there. This message would primarily be used for inter-relay routing.

@vasilvv
Copy link
Collaborator

vasilvv commented Jul 27, 2023

I don't agree there was support for chaning Full Track Name to Namesapce and Track Name tuple and only doing full matches. That is not going to work for many of the use cases and not what we are discussing.

I assumed that full track name was a tuple already, and "full track name as a concatenation" was a notational convenience. Now that I read the draft more carefully, it does send full track name in messages, rather than a tuple of track name and track NS. As far as I'm concerned, this is a mistake -- I don't think this actually works, and I have no idea how one is supposed to implement that.

we do not know how duplicate ANNOUNCE work even when the names are identical ( do we keep first, last , error etc )

I believe "error" is actually equivalent to "keep first" here.

I think it's a safety vs practicality trade-off. On one hand, if two different programs announce the same track, it will not work, since they will publish conflicting objects under the same name; if we disallow re-announcing, we will prevent that. On the other hand, re-announcing makes reconnects more practical.

@afrind
Copy link
Collaborator Author

afrind commented Jul 27, 2023

@fluffy Sorry if I misread the room and/or the author's meeting. Let's continue discussing.

we do not know how duplicate ANNOUNCE work even when the names are identical ( do we keep first, last , error etc )

That is being discussed in #225 / #227, let's keep this issue focused on the SUBSCRIBE REQUEST ambiguity?

@afrind
Copy link
Collaborator Author

afrind commented Aug 1, 2023

Individual comment:

Just to illustrate the problem further here, I wrote a short sketch of a generic relay (python syntax):

def on_announce(self, session, announce):
    self.announced_namespaces[announce.track_namespace] = session
    return AnnounceOk(announce.track_namespace)

def on_subscribe(self, session, sub):
    full_trackname = sub.full_trackname
    subscribers = self.subscribers_by_trackname[full_trackname]
    if subscribers is None:
        # first subscriber

        #### How do I find the session to serve the requested track? ####
        upstream_session = self.session_for_trackname(full_trackname)
        if upstream_session is None:
            # no such namespace has been announced
            return SubscribeError(sub.full_trackname, 404, "no such namespace")
        upstream_track_id = upstream_session.subscribe(sub)
        if !upstream_track_id:
            return SubscribeError(sub.full_trackname, 502, "subscribe failed")
        subscribers = []
        self.subscribers_by_track_id[(upstream_session, upstream_track_id)] = \
            subscribers
        self.subscribers_by_trackname[full_trackname] = subscribers

    # Add to subscribers list
    track_id = session.next_track_id()
    subscribers.append((session, track_id))
    return SubscribeOk(sub.full_trackname, track_id)

def on_object(self, session, object_header, object_payload):
    subscribers = self.subscribers_by_track_id[
        (session, object_header.track_id)]
    if subscribers is None:
        # Oops, no subscribers - unsubscribe??
        pass

    for downstream_session, downstream_track_id in subscribers:
        object_header.track_id = downstream_track_id
        downstream_session.publish(object_header, object_payload)

So the concrete proposal I have is to change SUBSCRIBE_REQUEST to have two fields, track_namespace and track_name.

Then the first bit would change to:

def on_subscribe(self, session, sub):
    full_trackname = sub.track_namespace + sub.track_name
    subscribers = self.subscribers_by_trackname[full_trackname]
    if subscribers is None:
        # first subscriber

        upstream_session = self.announced_namespaces[sub.track_namespace]

This is not to say the every relay has to do matches in this way. By squishing the namespace and name together it's not possible to implement a relay that does exact matching. If we put the tuple in SUBSCRIBE_REQUEST, a relay that wants non-exact matching rules based on full track name can always construct it by concatenating namespace and track_name together.

@fluffy Can you say more about the use cases that would break with a change like this, or suggest an alternate way to implement an exact matching relay?

@afrind
Copy link
Collaborator Author

afrind commented Sep 27, 2023

In a recent editor's call there was some discussion, which split into three separate issues - capturing it here:

  1. SUBSCRIBE REQUEST should have a way to denote which part of the Full Track Name is the namespace. Either a Track Namespace Length field or just transmitting Track Namespace and Track Name in two fields. There was consensus to add this.

  2. Does relay matching behavior (eg: exact match vs longest prefix vs ??) need to be negotiated in band? (Does relay matching behavior need to be negotiated in-band? #252)

  3. Is there a way to subscribe to more than one track at a time (eg: wildcard subscribe, Wildcard SUBSCRIBE #253).

I'd like to keep this issue focused on 1.

@afrind afrind added the Subscribe Related to SUBSCRIBE message and subscription handling label Oct 3, 2023
@ianswett ianswett self-assigned this Oct 9, 2023
ianswett added a commit that referenced this issue Oct 16, 2023
Fixes #204 by explicitly specifying the namespace.
ianswett added a commit that referenced this issue Oct 16, 2023
Fixes #204 by explicitly specifying the namespace.

Also consistently changes to the (b) notation to indicate a variable
length sequence of bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs PR Subscribe Related to SUBSCRIBE message and subscription handling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants