Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

algorithms on nostr that have a chance of working #522

Open
huumn opened this issue May 12, 2023 · 33 comments
Open

algorithms on nostr that have a chance of working #522

huumn opened this issue May 12, 2023 · 33 comments

Comments

@huumn
Copy link
Contributor

huumn commented May 12, 2023

Nostr already has an "algo" all relays use to order events. The algo is "sort by created_at in descending order." What I want is a way to ask relays for events using a variety algos such that algorithms are

  1. provided in a censorship resistant way
  2. transparent
  3. compete

I'd like to get some initial feedback before drafting a NIP.

I'm unsure of the nostr idioms but we need a way for clients to specify an algorithm and its arguments.

What I'm thinking:

{
    kinds: [1],
    algo: [ALGO_NAME, ALGO_ARG1_NAME, ALGO_ARG1_VALUE, ..., ALGO_ARGN_NAME, ALGO_ARGN_VALUE]
}

Where ALGO_NAME is an algorithm, and the rest of the array are argument name-value pairs spread into an array of strings. Algorithms should be rigourously specified as their own NIPs and clients may check for ALGO_NAME support and request events using them. All normal REQ arguments should be respected like normal and operate on the algo's input or output set (where the REQ args are applied, if relevant, should be specified in the algo's NIP probably).

e.g.

{
    kinds: [1]
    algo: ["TrustRank", "degree", "3"]
}

Anyway, that's the gist of what I'd like to propose. There's more to consider like how clients reconcile different results from relays, etc, but I'd like to guage if there's interest in this approach to algorithms first.

Is this the right idiom? Is there prior work I should be using instead/considering?

Alternatively, a new verb, much like COUNT could be proposed. wyt

@alexgleason
Copy link
Member

alexgleason commented May 12, 2023

I've been thinking about this too, since seeing bluesky's feed generator: https://github.com/bluesky-social/feed-generator

Instead of every relay implementing support for an algo filter, I was thinking it makes sense to create purpose-built relays, and then clients should add the ability to view a specific relay's feed. It doesn't even have to be a full relay, it could just proxy events from other relays (like an aggregator) and display specific content in a specific order.

Coracle already lets you view the feed of individual relays: https://coracle.social/relays/relay.stoner.com

@staab
Copy link
Member

staab commented May 12, 2023

I'd like to see this too, my preference would be a new verb that asks for recommendations so that relays can implement it if they want to (mostly so the NIP can be extended with more functionality over time), although the above algo property on REQ could also work.

I'll also just take this opportunity to plug my theory on how to manage advanced relay features like this, that most relays may not want to implement: #259. The basic idea is that relays should recommend extensions/relays that support the desired filter to avoid centralizing pressure exerted by second-layer services.

@huumn
Copy link
Contributor Author

huumn commented May 12, 2023

every relay implementing support for an algo filter

I know it's what you meant, but just to be extra clear for casual reading, its every relay implementation implementing support ... which is less onerous than all relay operators implementing it.

AFAICT this argument tends to be reoccurring on nostr. Do we have relays implement counts/search and advertise their optional support, or do we have an unspecified heterogenous second relay layer? It seems like the "relays implement it" approach has won to date, right?

I think if we want another proxy/cache layer instead, then that should probably be specified and standardized. I suspect it hasn't yet because it's not clear what actually divides the layers if each layer depends on having the events.

The last conversation I was in about this was related to counts, and the argument that won AFAICT was "this is a nip, relays can implement it and other nips, and proxies/caches can only implement counts/search if they want." I was convinced.

Without relays implementing things users want, it seems like they'll seek a centralized party (eg a specific relay with an algo they like) which is antithetical to the part of nostr's mission that excites me.

@staab
Copy link
Member

staab commented May 12, 2023

Related:

It's also worth mentioning that in my opinion recommendations may be doable by using custom relays with no explicit update to the protocol. IOW you would send a filter to a relay that interprets it differently than standard relays in order to retrieve recommendations that relay promises to provide. I think explicit protocol support would be superior, but still.

@huumn
Copy link
Contributor Author

huumn commented May 12, 2023

I'll also just take this opportunity to plug my theory on how to manage advanced relay features like this, that most relays may not want to implement: #259. The basic idea is that relays should recommend extensions/relays that support the desired filter to avoid centralizing pressure exerted by second-layer services.

I tend to agree that it's worth throwing the kitchen sink at relay discovery, including involving relays in the effort, as nostr's censorship resistance depends on it. I don't want to derail this thread though so I'll add some of my thoughts to that issue instead.

@fiatjaf
Copy link
Member

fiatjaf commented May 12, 2023

I don't see why not just use relays as feed providers as @alexgleason is saying above. Relay URLs are transparent and it's immediately clear who you're trusting as your source of information in each case, relays have identity and personality, you can cycle through them and do all sorts of things.

Meanwhile hidden algorithms in arcane request filters are completely beyond user control and much more limited.

@huumn
Copy link
Contributor Author

huumn commented May 12, 2023

relays have identity and personality

identitiy = domain names and the "personality" = opaque/bespoke ordering of events?

you can cycle through them and do all sorts of things

Things like what? Is there a client doing this?

Meanwhile hidden algorithms in arcane request filters are completely beyond user control and much more limited.

Hidden in a NIP with a human readable name, transparent logic, and args that could be exposed to users. I fail to see how this is "hidden" compared to the alternative of "trust the opaque responses of this one relay."

I think I understand where you're coming from, but there isn't a booming marketplace for relays, let alone relay "personalities."

Also, how interoperable are relay personalities? How interchangeable? How replaceable? If most users want a relay with Dave Chapelle's personality, how decentralized is that network if Dave Chapelle isn't specified in a NIP?

@staab
Copy link
Member

staab commented May 12, 2023

I think I understand where you're coming from, but there isn't a booming marketplace for relays, let alone relay "personalities."

This can be solved, but yeah, you're right that relays are currently pretty much interchangeable. Mazin is doing great work on this.

This question recently came up on #457 (comment). I think bare relays are underrated as repositories for "personality" — either as providing additional functionality, or as a source of moderation/content recommendations.

@alexgleason
Copy link
Member

If most users want a relay with Dave Chapelle's personality, how decentralized is that network

Not to get too philosophical, but- well actually this is a GitHub issue, so it's the perfect place to get philosophical.

First of all if Dave Chapelle was a Nostr relay, it would be awesome. Everyone would want to be on that relay (except the people that hate Dave Chapelle), and if we lost that relay it would be a tragedy. Fortunately you can also be on a dozen or so other relays at the same time you're on the Dave Chapelle relay.

Making relays a prominent part of the Nostr UX actually furthers decentralization, because it makes people aware of the fact that relays exist and helps them understand why they matter. This is the thing we need to gameify most.

here isn't a booming marketplace for relays

Exactly, which is why it's an opportunity to do something impactful on the Nostr network. Serving algorithms from relays is a perfect use-case to push relays as part of the main Nostr UX instead of as some technical detail that happens to make this whole thing work.

This is making me want to build a bestnostr.top relay, with URLs like wss://bestnostr.top/popular, wss://bestnostr.top/controversial, wss://bestnostr.top/new. You could build a whole framework dedicated just to the idea of creating relays that serve algorithms.

@huumn
Copy link
Contributor Author

huumn commented May 12, 2023

This question recently came up on #457 (comment). I think bare relays are underrated as repositories for "personality" — either as providing additional functionality, or as a source of moderation/content recommendations.

These ideas aren't mutually exclusive IMO. The moderation can happen in addition to an algo being run over it.

My concern is that without a way to communicate and share algorithms at the protocol layer, nostr will have a close source Google/Facebook mega-relay with a winning "personality," they'll vertically integrate into a client, hide nostr/relays from users, everyone will use them, and we are back to where we started, aren't we?

@staab
Copy link
Member

staab commented May 12, 2023

a way to communicate and share algorithms at the protocol layer

Completely agree, again, #259

But yes, it would be cool to come up with a NIP that also allows for recommending/labeling relays using events, maybe using the same mechanism as content labeling from this PR.

@fiatjaf
Copy link
Member

fiatjaf commented May 12, 2023

There is no such thing as "protocol layer", there are just ways of doing things. Some ways are simpler and more open than others. No one wants to run a relay that is just a dumb repository of human garbage. People want to run servers to make a difference, either because they believe in something or because they like something or because they have a community or because they have some value to provide and get some profit in return.

By the way, if all relays are just dumb replicated instances of some "implementation" without any personality, then the expected trend is that that everybody just starts using the faster and the one with more content, while smaller ones get disincentivized and then we'll get to the Google/Facebook world much faster.

@fiatjaf
Copy link
Member

fiatjaf commented May 12, 2023

I'm not sure, but I think what happens in the fediverse is that people run servers with personality. Except there their possibilities are much more limited, since they basically have to pick Mastodon or Pleroma, choose some colors and names and that's it, and it's even worse than that because even if you run a niche Mastodon server that thing will not attract anyone else because everybody already have accounts in other servers.

Running a Nostr relay has the potential to become a much more interesting experience and attract all these fediverse people.

@huumn
Copy link
Contributor Author

huumn commented May 12, 2023

By the way, if all relays are just dumb replicated instances of some "implementation" without any personality, then the expected trend is that that everybody just starts using the faster and the one with more content, while smaller ones get disincentivized and then we'll get to the Google/Facebook world much faster.

We got the Google world, not because it was faster or had more content than Yahoo. We got the google world because its "personality" was patented, close source, and way better than Yahoo's. Enough people preferred that personality that it starved all others.

"Personalities" can be objectively better, just like things can be faster and have more content, and sometimes they are even google magnitudes better.

I think the difference in our POV might be that you think users differ significantly in their "personality" preference, so much so that personality will be the decentralizing force that nostr needs. I don't think it is. There were plenty of quirky search engines out there when google arrived.

@huumn
Copy link
Contributor Author

huumn commented May 12, 2023

Perhaps if the quirky search engines could've pooled their efforts together somehow, they could've competed with google ... but that wasn't really possible until nostr.

@fiatjaf
Copy link
Member

fiatjaf commented May 13, 2023

I think you got the wrong idea of what I mean by "personalities". It wasn't like Google was better at finding popular music and it killed search engines that were good at finding classical music, Google was objectively immensely better at the same task all the others were trying to do. Google had more data, was faster, and yielded better matches at the exact same task. I think the analogy doesn't fit here at all.

Even if I'm wrong, though, I don't see how writing some names of algorithms on a NIP will help. Google can still come and invent an algorithm called "google" that only works on their relay and boom, we die.

@huumn
Copy link
Contributor Author

huumn commented May 13, 2023

I think you got the wrong idea of what I mean by "personalities". It wasn't like Google was better at finding popular music and it killed search engines that were good at finding classical music, Google was objectively immensely better at the same task all the others were trying to do.

Isn't everyone building social clients on nostr for the most part? It seems like having a social feed and ranking events is something nearly every app does these days - even non-social apps.

Even if I'm wrong, though, I don't see how writing some names of algorithms on a NIP will help.

The point of specifying algorithms is the same as specifying anything else - interop and a consistent default relay experience so clients know what to expect. Ranking algorithms aren't special. They're fundamental parts of nearly every modern application we use.

Google can still come and invent an algorithm called "google" that only works on their relay and boom, we die.

I don't think this is necessarily the case if we can pool dev efforts together on algorithm specs and compete against them. Afaict, it's much more likely to happen if we don't make any effort to create an alternative path.

@huumn
Copy link
Contributor Author

huumn commented May 13, 2023

Mr. Jaf asked on SN:

Can you give some examples of algorithms?

I put my response in a gist here: https://gist.github.com/huumn/437925d2861007f12723c72f355959b1

@fiatjaf
Copy link
Member

fiatjaf commented May 13, 2023

Thank you very much, that gives me a better idea of what you want and this idea makes sense as a NIP now. I still don't think we need such a thing and I still disagree with everything you said here, but at least now it is not completely absurd.

@fiatjaf
Copy link
Member

fiatjaf commented May 13, 2023

One question that has ocurred to me: what is the expected behavior for when someone asks for an algorithm but the relay doesn't have that implemented?

@huumn
Copy link
Contributor Author

huumn commented May 13, 2023

I still disagree with everything you said here

I still don't understand what your POV is on algorithms. It could be "we don't need them because ..." or "they aren't fundamental/universal enough to be a nip because ..."

You've said some stuff about personalities implying that by specifying algorithms it leads to some kind of reductive uniformity and that being able to replace one relay with another that's faster is centralizing. Even if we ignore that people already do this or assume we'll educate them into not doing this, I don't follow how algorithms change this.

Honestly, I'll happily make a mega relay to provide the UX people want like everyone else if this is what you think is best for the protocol. I'm proposing algorithms because I think mega relays are bad for the protocol.

One question that has occurred to me: what is the expected behavior for when someone asks for an algorithm but the relay doesn't have that implemented?

The same behavior when someone asks for a NIP but the relay doesn't have it implemented.

@fiatjaf
Copy link
Member

fiatjaf commented May 13, 2023

Even if we ignore that people already do this or assume we'll educate them into not doing this, I don't follow how algorithms change this.

Two comments about this:

  1. Whatever people are doing today is irrelevant. We can't shape Nostr by what the current people think or what is working now for them. For Nostr to grow as much as it needs to grow things must necessarily change.
  2. I don't think a NIP for specifying algorithms makes these things worse, no, I just think it's not the best approach to the problem.

Honestly, I'll happily make a mega relay to provide the UX people want like everyone else if this is what you think is best for the protocol.

If you want to make a megarelay I really think you should make a megarelay. That will probably make the experience better and allow Nostr to grow more, and teach us a lot in the process, and it's probably safe because you're well-intentioned. Also if it happens that you become evil and try to capture the protocol in the process, then I think you will not succeed, but if you do succeed that's also good, saves us from wasting more time on a protocol that can be captured so easily.

The same behavior when someone asks for a NIP but the relay doesn't have it implemented.

What is this behavior? I don't think this has ever happened. What NIP can a client ask today and the relay not have it implemented?

@huumn
Copy link
Contributor Author

huumn commented May 14, 2023

What is this behavior? I don't think this has ever happened. What NIP can a client ask today and the relay not have it implemented?

I think we are misunderstanding each other. What happens if I send a COUNT request to a relay that doesn't implement it?

As said, I might not have the idiom right. The idea was that every new algorithm would be its own NIP and relays that support an algorithm put it in supported_nips.

I don't think a NIP for specifying algorithms makes these things worse, no, I just think it's not the best approach to the problem.

Could you share a little more detail on what you think is a better approach? I'm asking earnestly.

If you want to make a megarelay I really think you should make a megarelay.

I don't want to build a megarelay. What I want is to build something like SN on nostr. If I build something centralized on nostr it kind of defeats the purpose yet an SN on nostr needs algorithms ... so I'm trying to figure out how to get decentralized algorithms on nostr. Unless there's a better way to do this.

@fiatjaf
Copy link
Member

fiatjaf commented May 14, 2023

What happens if I send a COUNT request to a relay that doesn't implement it?

The relay will just not reply. But that is different than sending a REQ with a filter that the relay doesn't implement. The relay will just ignore that filter and act on the other ones. I guess in this case specifically it isn't so bad.

Could you share a little more detail on what you think is a better approach?

OK. I'll do my homework, think better about this and write something.

I don't want to build a megarelay.

But you said you would make one happily!

What I want is to build something like SN on nostr. If I build something centralized on nostr it kind of defeats the purpose yet an SN on nostr needs algorithms ... so I'm trying to figure out how to get decentralized algorithms on nostr. Unless there's a better way to do this.

I see, now it makes more sense. You want that something like SN displays events ordered by zap totals + timestamp + comments and whatnot, right? Some magic calculation, I suppose. For this specific case why can't the ordering be done on the client side?

@huumn
Copy link
Contributor Author

huumn commented May 14, 2023

But you said you would make one happily!

AFAICT you seemed ambivalent, mostly said it didn’t matter either way, and were happy with me wasting my time however I wanted.

For this specific case why can't the ordering be done on the client side?

Clients have a hard time decoding json for a normal number of events, right? Think about it a little. I wouldn’t only need 100 events and profile metadata. I would need 100-1000x the normal events and metaevents, then I’d need to run an algorithm over it. It’d be at least as slow as on the server to run the algorithm (probably slower by an order of magnitude on the average device), but with the additional time of encoding, transferring, decoding, and validating the data.

@ekzyis
Copy link

ekzyis commented May 16, 2023

Also if it happens that you become evil and try to capture the protocol in the process, then I think you will not succeed, but if you do succeed that's also good, saves us from wasting more time on a protocol that can be captured so easily.

I get where this is coming from. I feel the same about bitcoin now.
But I think nostr is nowhere near where we can just think like that. I think we are still at a early stage where we have the initiative and have to be vigilant about centralizing forces.

You can compare this with Satoshi withholding code for GPU mining to "defend the network" in case someone else starts to mine using GPUs while bitcoin was still in its infancy. We should do the same with proposing NIPs (like this) in advance.

You want that something like SN displays events ordered by zap totals + timestamp + comments and whatnot, right? Some magic calculation, I suppose. For this specific case why can't the ordering be done on the client side?

Adding to what @huumn mentioned, relays can create database indices which can improve peformance by orders of magnitude. Clients cannot create indices since they don't have access to full data. And clients should not need to have access to full data.

To be honest, I don't know what you mean with "for this specific case". Isn't this in fact a good example what should be done on the relay side?

I don't see how writing some names of algorithms on a NIP will help. Google can still come and invent an algorithm called "google" that only works on their relay and boom, we die.

I think we shouldn't argue about what Google would or wouldn't do. So yes, I agree. Google could still come in and create the "super-algorithm" which only works on their relay. We have no control over what Google would do; this or any other NIP doesn't change this nor ever will.

So I think we should focus on the things we can indeed control. Imho, a NIP about algorithms could help. It would give us more chances to create a more diverse ecosystem using open standards such that "decentralized nostr" could (at least try to) compete with a walled garden.

Regarding the discussion about relay feeds and "super-relays": I think this is a valid point specific to this NIP. Most NIPs don't require full access to data and/or potentially lot of computational resources and thus can be more easily implemented by relays. So I think the idea about relay feeds where a client just connects to a specific relay to be served the feed implemented by this relay is understandable.

However, imho, proposing potentially intransparent relay feeds like this as an alternative to a NIP about algorithms is indeed a centralization vector. I haven't though much about this yet but isn't database sharding the keyword here then so other relays can cooperate to serve a competitive experience? Not sure how feasible that is on nostr, but I thought I just mention this keyword. Maybe someone else with more experience can pick this idea up and actually do something with it, haha

@arthurfranca
Copy link
Contributor

The idea was that every new algorithm would be its own NIP and relays that support an algorithm put it in supported_nips.

@huumn I think you are right, for this to avoid centralization, the nostr dev community should work towards algo commoditization. Meaning that algo logic shoud be described in a NIP in a way that can be easily applied to relays.

In praticat terms, for relays I think it could be as simple as using nipXX_score field in place of the created_at one for sorting, when nipXX algo sorting gets requested along with regular filters (The higher score the event has, the higher event position when responding to client request, as happens today with created_at cause newer date has higher miliseconds amount). "since" and "until" would be used as usual for paginating but also using the score field instead of the date one. Easy peasy.

Now, the most important and difficult step is for a good soul(s) to start creating these algo related NIPs that would explain how to apply math that leads to the score. For instance, the nipXX_score field of an event may be updated in a specific way when one or more of these situations happen:

  • the event is first received by the relay
  • the event is liked
  • the event is commented
  • ...

Besides this, it may take time for complex algos to be made available as NIPs or may be even impossible to setup with this fixed score field approach for anyone that may be requesting, e.g. algos that take into account which user is requesting events, like what the user prefers based on who he follows.

So for scored global-feed-like algos its doable as I described. But for these harder algos I think we have already a good proposal with NIP-97 that recommends closed-source algo implementations.

@huumn
Copy link
Contributor Author

huumn commented May 16, 2023

In praticat terms, for relays I think it could be as simple as using nipXX_score field in place of the created_at one for sorting, when nipXX algo sorting gets requested along with regular filters (The higher score the event has, the higher event position when responding to client request, as happens today with created_at cause newer date has higher miliseconds amount).

I agree generally that there needs to be a way to communicate scores to clients. I don't know what the right pattern is. I wouldn't imagine it involves adding or overloading fields of events but I also haven't proposed an alternative.

Now, the most important and difficult step is for a good soul(s) to start creating these algo related NIPs that would explain how to apply math that leads to the score.

For sure. It's nontrivial but feasible imo.

But for these harder algos I think we have already a good proposal with #493 that recommends closed-source algo implementations.

If it's nostr's view that megarelays are a healthy part of the protocol, this seems like a good way to communicate with them and share preferences. It's consistent with a lot of the relay indirection elsewhere in the protocol.

@TimA314
Copy link

TimA314 commented Jun 9, 2023

I realize this topic is heavily debated. But I think algorithms would be best handled by the client. Sorting algorithms could be placed in nostr-tools or something would be a better place? I feel like it is important for clients to have more control than relays. Sure relays can spit out what they want but clients should determine how to display content (to include sorting)

Right now events are pulled from usually many relays. If for example only one or a few support algorithms then clients will start to only query these relays reducing decentralization.

@ekzyis
Copy link

ekzyis commented Jun 9, 2023

But I think algorithms would be best handled by the client. Sorting algorithms could be placed in nostr-tools or something would be a better place?

How do you want to sort content if you don't have access to the full content?

@huumn
Copy link
Contributor Author

huumn commented Jun 9, 2023

I think algorithms would be best handled by the client.

Are you aware of any clients doing this? Admittedly, I haven't looked super hard but I haven't found one.

I agree with the fat client ideal. However, one of the most naive algorithms I listed is technically infeasible to perform on a client as is, e.g. oldest events first. A case can be made for not needing many of those algorithms and that perhaps many of the algorithms people actually want can, in fact, be conveniently provided by clients.

But, to steelman myself, some algorithms people want do require a relay-sized (or even network-sized) amount of data, e.g. search. Assuming clients don't want to download a relay's worth of data, what should we do instead?

I thought running search on relays using a standardized search algo which clients "combine" or "reduce" was the next best decentralizing thing. If I haven't misunderstood this thread, the more experienced nostratis disagree:

  1. standardization on relays (more than that which already exists) is centralizing
  2. "mega-relays," relays that attempt to store and index all of the network's data, will fail or otherwise be neutral in their centralizing effect
  3. relay diversity on things like search once it emerges (or gains more adoption if it has already emerged) will be maximally decentralizing

@huumn
Copy link
Contributor Author

huumn commented Jun 27, 2023

Based on the conversation at #579, a better way to do this might be to use standardized paths for specified algorithms.

e.g. wss://some.relay.com/.algo/zaprank?name0=arg0&name1=arg1

There seems to be limited interest in algo arguments (re: relays shouldn't be that smart), but this would allow named arguments via search params.

@fiatjaf
Copy link
Member

fiatjaf commented Jun 28, 2023

wss://some.relay.com/.algo/zaprank?name0=arg0&name1=arg1

I'll vote for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@fiatjaf @staab @arthurfranca @alexgleason @ekzyis @huumn @TimA314 and others