Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollback replaceable events #1731

Open
v0l opened this issue Jan 31, 2025 · 24 comments
Open

Rollback replaceable events #1731

v0l opened this issue Jan 31, 2025 · 24 comments

Comments

@v0l
Copy link
Member

v0l commented Jan 31, 2025

I think it would be wise to simply revert relay code which deals with replaceable events to treat them as regular events.

Clients already MUST handle multiple versions from different relays so the replacing logic can be done on the clients.

I don't see why we would ever need to handle this on the server.

@fiatjaf
Copy link
Member

fiatjaf commented Jan 31, 2025

I'd say this is not a completely absurd idea, only 90%.

@v0l
Copy link
Member Author

v0l commented Jan 31, 2025

Its less absurd than that, its pretty much a backward compatible change to the protocol

@fiatjaf
Copy link
Member

fiatjaf commented Jan 31, 2025

I don't think the protocol mandates that replaceable events must be deleted when you accept a new one, so we don't have to change anything? I would be happy to see some relays out there keeping track of multiple versions of stuff like follow lists and articles. But on practical grounds I think it would be madness if all relays did it, unless they were careful about it.

For example, if a relay gets a REQ for a replaceable kind of a single author and limit:10 it would be good to have it return 10 results; but if they get a REQ with no limit and 30 authors then it would be better if they just returned 1 result per author.

@v0l
Copy link
Member Author

v0l commented Jan 31, 2025

I don't think the protocol mandates that replaceable events must be deleted when you accept a new one

From NIP-01
only the latest event MUST be stored by relays, older versions MAY be discarded.

So why do we call them replaceable at all? Its mostly up to client to deal with it then already.

For example, if a relay gets a REQ for a replaceable kind of a single author and limit:10 it would be good to have it return 10 results; but if they get a REQ with no limit and 30 authors then it would be better if they just returned 1 result per author.

So with a little tweak to some wording we could just forget about the idea that there are replaceable events? Sounds like a win to me.

@vitorpamplona
Copy link
Collaborator

As long as the filter by them only returns one item, I don't care what relays do.

@v0l
Copy link
Member Author

v0l commented Jan 31, 2025

As long as the filter by them only returns one item, I don't care what relays do.

Cant clients just add limit: 1 though?

no limit and 30 authors then it would be better if they just returned 1 result per author.

Only problem is this kind of query

I mean its pretty much a myth that events are "replaceable" in the first place because there is no global state, i think the wording alone causes so many issues.

@vitorpamplona
Copy link
Collaborator

I don't disagree on the wording and the limit:1 filter. The issue is that we do a lot of filter combinations because relays are extremely restrictive on their limits.

We only need the concept of "replaceables" because Nostr's query language is too restrictive.

And because event IDs and pubkeys are so big, turning a combined filter into multiple limit:1 queries explodes the size of the filter in the amount of chars. If we are loading 100s of follows as I usually do, the amount of duplicated strings becomes massive.

@v0l
Copy link
Member Author

v0l commented Jan 31, 2025

I don't disagree on the wording and the limit:1 filter. The issue is that we do a lot of filter combinations because relays are extremely restrictive on their limits.

We only need the concept of "replaceables" because Nostr's query language is too restrictive.

And because event IDs and pubkeys are so big, turning a combined filter into multiple limit:1 queries explodes the size of the filter in the amount of chars. If we are loading 100s of follows as I usually do, the amount of duplicated strings becomes massive.

Maybe we should introduce the concept of a "query key" which is used to apply limit on a given filter, via another term like qlimit

"query key" is just the key used to identify the latest version of an event so qlimit applies to each distinct key, and the qkey is id / kind:pubkey / kind:pubkey:dtag as is already the case in a lot of nostr code when when you have to deal with these kind ranges.

I also agree that relay limits are extremely low, and how some clients and relays deal with relay limits is seriously lacking and causes a bad experience for everybody because they dont do any queuing or simply reject the req silently making it look like there are no results.

@fiatjaf
Copy link
Member

fiatjaf commented Jan 31, 2025

strfry has already increased default limits by 10x.

I'm not sure it's much better to have one REQ with 30 authors than have 30 REQs with 1 author each.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Jan 31, 2025

To give you an idea, I have a single sub that downloads all the basic info, lists, settings, etc for all logged in users on Amethyst. This sub is active at all times, in all connected relays, because I want to get any new changes to those things as soon as possible.

{
  kinds = [
    MetadataEvent.KIND,
    ContactListEvent.KIND,
    AdvertisedRelayListEvent.KIND,
    ChatMessageRelayListEvent.KIND,
    SearchRelayListEvent.KIND,
    FileServersEvent.KIND,
    BlossomServersEvent.KIND,
    MuteListEvent.KIND,
    PeopleListEvent.KIND,
    BookmarkListEvent.KIND, 
    BadgeProfilesEvent.KIND, 
    EmojiPackSelectionEvent.KIND,
    AppSpecificDataEvent.KIND
  ],
  authors = loggedInUserList
}

Those are all replaceable. If I had to break it down to a limit for each kind and author pair, the filter would be HUGE.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Jan 31, 2025

Maybe filters can have a "latestBy": ["<kind,author, #tag>", "<kind,author, #tag>"] property. Then we could also have "latestBy": ["kind", "author", "#d"] and a "latestBy": ["kind", "#p"] to get the latest notification per kind

@fiatjaf
Copy link
Member

fiatjaf commented Jan 31, 2025

Maybe filters can have a "latestBy": ["<kind,author, #tag>", "<kind,author, #tag>"] property. Then we could also have "latestBy": ["kind", "author", "#d"] and a "latestBy": ["kind", "#p"] to get the latest notification per kind

This would kill relay performance forever and ever.

@fiatjaf
Copy link
Member

fiatjaf commented Jan 31, 2025

To give you an idea, I have a single sub that downloads all the basic info, lists, settings, etc for all logged in users on Amethyst. This sub is active at all times, in all connected relays, because I want to get any new changes to those things as soon as possible.

{
kinds = [
MetadataEvent.KIND,
ContactListEvent.KIND,
AdvertisedRelayListEvent.KIND,
ChatMessageRelayListEvent.KIND,
SearchRelayListEvent.KIND,
FileServersEvent.KIND,
BlossomServersEvent.KIND,
MuteListEvent.KIND,
PeopleListEvent.KIND,
BookmarkListEvent.KIND,
BadgeProfilesEvent.KIND,
EmojiPackSelectionEvent.KIND,
AppSpecificDataEvent.KIND
],
authors = loggedInUserList
}
Those are all replaceable. If I had to break it down to a limit for each kind and author pair, the filter would be HUGE.

Couldn't these be 13 subscriptions instead of 1 subscription with 13 kinds? Sounds ok to me.


But that's not even what I'm advocating for, I think the least breaking change is what I suggested above:

For example, if a relay gets a REQ for a replaceable kind of a single author and limit:10 it would be good to have it return 10 results; but if they get a REQ with no limit and 30 authors then it would be better if they just returned 1 result per author.

Because in practice you never want multiple versions for multiple kind/addresses (and if you do you just make multiple REQs), what you want is either the latest for a bunch of kinds or multiple versions for a single kind/address.

@vitorpamplona
Copy link
Collaborator

Yep, that's the key for replaceables: No one ever wants the versions. Not even if it is a mixed filter.

In NIPs where versions are desirable, we just use regular events.

In fact, I don't think it makes any sense for relays to keep old versions of any replaceable. I would just delete them all. Race conditions are not the relay's problem.

@staab
Copy link
Member

staab commented Jan 31, 2025

I'm not sure it's much better to have one REQ with 30 authors than have 30 REQs with 1 author each.

I kind of think this is crazy, but also sort of the logical consequence of outbox, since queries should be split up and sent to different relays anyway. limit: 1 queries are already such a pain now though I don't really think leaning into that is the best idea. I also don't think adding to filters is the best idea, since relay implementations currently don't follow postel's law.

Reservations aside, something like what @v0l is suggesting, or more explicitly some kind of latest_by: ['pubkey', 'kind'] would do the trick, and simplify other use cases (like asking for the most recent kind 1 for each pubkey, which I actually want to do to power something like keypub.coracle.social).

@vitorpamplona
Copy link
Collaborator

Just dropping this here...

[
  "NQL",
  "Sub Id",
  {
    "where": {
      "kinds": [1, 20],
      "authors": [
        "user1",    
        "user2",
        {
          "select": "#p", 
          "where": { "authors": ["user1", "user2"], "kinds": [3] } 
          "latest_by": [ "author" ]
        }
      ]
    },
    "limit": 2000
  }
]

Spec:

["NQL", "<sub id>",  <NQL>]

NQL:

{
  "select": <"id"|"kind"|"author"|"#key"|"created_at"|"content"|"sig">, 
  "where": { 
    "ids": [<event ids, NQL>, ...],
    "authors": [<pubkeys, NQL>, ...],
    "kinds": [<kinds, NQL>, ...],
    "#<tag key>": [<tag values, NQL>, ...],
    "since": <timestamp, NQL>,
    "until": <timestamp, NQL>,
  } 
  "latest": [ <"kind"|"author"|"#key"> ]
  "limit": <number> 
}

@fiatjaf
Copy link
Member

fiatjaf commented Jan 31, 2025

I kind of think this is crazy

Why? From the relay perspective it is running 30 index queries anyway, doesn't matter if they come in grouped or not. In fact I think having the 30 be strictly separated is easier (by which I mean faster) because the relay doesn't have to check one against the other for duplicate results or do a global sort on all the results by timestamp.

(This is considering the way strfry works and also my eventstore LMDB and Badger implementations.)

like asking for the most recent kind 1 for each pubkey, which I actually want to do to power something like keypub.coracle.social

Likewise, it shouldn't be a problem at all to do this with hundreds of REQs all in the same connection, specially if you close them immediately after EOSE.

@staab
Copy link
Member

staab commented Jan 31, 2025

Why?

If you s/30/2000/ and you're wasting a lot of traffic on message envelopes. And many other minor aesthetic concerns.

"select": <"id"|"kind"|"author"|"#key"|"created_at"|"content"|"sig">,

We shouldn't encourage people to skip validating signatures

@vitorpamplona
Copy link
Collaborator

We shouldn't encourage people to skip validating signatures

That's fine, we can get rid of content them. Sometimes we just need to know which ids, authors, kinds, created_ats and tags are available for a query.

@fiatjaf
Copy link
Member

fiatjaf commented Jan 31, 2025

a lot of traffic on message envelopes

You mean 20 bytes.

If you're querying for a thousand pubkeys that's 20kb, which is smaller than the smallest CSS file among the 190 you have loaded in this GitHub page.

Not to mention the fact that you can save on bandwidth if you keep track of the latest timestamp for each user locally and send a since in your REQs such that most of them won't return anything -- while if you try to group them you'll necessarily redownload some events you already have locally.

@alexgleason
Copy link
Member

I think semantically it's probably fine to change nip01, but in my SQL relay implementation there's a huge performance difference between:

select * from events where kind = 0 and authors in [a,b,c];

and

select * from events where kind = 0 and authors in [a,b,c] order by created_at desc;

The first one can utilize a UNIQUE index on kind,pubkey, making it about 100x faster. So I would not change my implementation.

@alexgleason
Copy link
Member

Oh right, actually the above query would be ruined in any case, because you could get a bunch of kind 0 events from a single author and none from the other authors depending on your limit... so maybe this is not even good semantically.

@alexgleason
Copy link
Member

I think this proposal only makes sense if the authors field of filters also changes to become a single value, instead an array.

So eg:

{ kinds: [0], authors: [a, b] }

must become

[{ kinds: [0], pubkey: a }, { kinds: [0], pubkey: b }]

making this issue oddly related to #1645

@mikedilger
Copy link
Contributor

Yep, that's the key for replaceables: No one ever wants the versions. Not even if it is a mixed filter.

In NIPs where versions are desirable, we just use regular events.

In fact, I don't think it makes any sense for relays to keep old versions of any replaceable. I would just delete them all. Race conditions are not the relay's problem.

Yes I think replaceable events as defined and as in use cannot be repurposed as versioned events. Versioned events will need something different, maybe a new kind range. Because some relays delete and also because queries all are written to expect only one per event address.

Otherwise we'd have to migrate a lot of code to do the "limit: 1", including Vitor breaking up a lot of filters.... even though that endpoint is natural and elegant (just ask limit 1, they are time reversed, so you always get the latest), we can't get there from here without breaking a lot of things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants