Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content popularity #3649

Closed
xoriole opened this issue May 28, 2018 · 7 comments
Closed

Content popularity #3649

xoriole opened this issue May 28, 2018 · 7 comments

Comments

@xoriole
Copy link
Contributor

xoriole commented May 28, 2018

Content Popularity

We have channels and torrents which comprise/represent content within Tribler. However, we do not yet have a mechanism to check if the content is popular, alive or even dead. This ticket is created as home for tracking development to address content popularity.
Parent issue #2783

The basic idea is to get a simple implementation operational first and incrementally build upon it, starting with torrent popularity.

Torrent popularity
Check a set of torrents (max: 25) using torrent checker and gossip the popular results to a set of connected peers (max:25) regularly at a fixed interval of time.

Channel Popularity
What are the indicators of a popular channel?

  • Swarm size of comprising torrents
  • Votes on the channel
  • ...

These parameters can be used to derive a popularity index/score (very simple, initially) which can be used to rank the channels and disseminate the most popular channels to the peers regularly at a certain interval of time.

Issue to address: How to handle the conflicting response about the same torrent or channel from multiple peers?

@xoriole xoriole added this to the V7.1: The token micro-economy milestone May 28, 2018
@xoriole xoriole self-assigned this May 28, 2018
@ichorid
Copy link
Contributor

ichorid commented May 30, 2018

One way to handle the problem of conflicting response is to do away with absolute measures of popularity, and instead use relative measures of popularity.
For example:
Node1 makes popularity measurements for torrents A, B, C: A>B>C.
Node2 does the same for torrents I,J,K: I>J>K.
Node2 asks Node1 to provide some data on torrent popularity, and Node1 answers with "A>B>C".
Node2 now has two partial orderings: "I>J>K", "A>B>C". It does not know how A relates to K, so it does a minimal check itself, querying the number of seeds for both K and A. That allows it to establish the complete ordering "I>J>K>A>B>C" in the case "K>A", or use some efficient sorting algorithm with minimum number of comparisons to handle the case where "A>K".
This mechanism allows Node2 to "glue" the orderings together. Moreover, one could do that in "lazy" mode, only doing it for the subset of torrents that user queried in the current view (e.g. search filter, channel view, etc.)
The conflicts like "A>B vs B>A" could be resolved the same way, by checking conflicting info immediately, and deciding for ourselves what is true.

To put it another way, when we do "content popularity", essentially we want to sort torrents according to some function. There are sorting algorithms that perform very well if the array to sort is composed of big chunks of already sorted data. We treat data from other nodes as this "half-sorted" data, and finish the sort ourselves, trying to do as little checks ourselves as possible.

Essentially, we treat discrepancy in data provided to us by other peers as an error in sorting order that should be fixed by applying some sorting algorithm, with real-world data getting the last word.

@xoriole
Copy link
Contributor Author

xoriole commented May 30, 2018

Thank you for the suggestion. It would nicely apply for sorting multiple responses for multiple torrents. However, the current issue is different in the sense that we are talking about multiple health responses (number of seeders and leechers) for a single torrent obtained from multiple peers.

The real question is whose response do we trust more to decide whether or not to update our local database. One approach in current consideration is to use 1) freshness of response and 2) trust score of sending peer from trustchain to make the decision.

@ichorid
Copy link
Contributor

ichorid commented May 30, 2018

@xoriole, my suggestion solves this problem.
You don't want to get peer's opinion on a single torrent. You always get it for multiple torrents, in the form of an ordering. You don't ask for the number of seeders, you only get relative ranking.
Then you add it to your database, doing some sanity checks yourself. Like, check relation of the first and last torrent in the received ordering (by getting seeds yourself).

Whenever you see a conflict in orderings, you check it yourself. The only alternative is to use trust score. But even in the case of using the trust score, you'll have to do the check sometime, to catch cheaters and add your part of knowledge to the "swarm mind". The most conflicted opinions would get the most attention and re-checking.
BTW, we can't even establish the trust score precisely, because of high variance in seeds/trackers visibility. So, it's more about "opinion reliability" score.

@ichorid
Copy link
Contributor

ichorid commented May 30, 2018

It's just like selective checks in a supermarket or in transport.

@ichorid
Copy link
Contributor

ichorid commented Jun 1, 2018

@synctext
Copy link
Member

You don't ask for the number of seeders, you only get relative ranking.

It usually better to ask for factual and verifiable data. Relative ranking are difficult to verify and catch a spammer in a lie. Exact swarm data is better in this case I believe.

@arvidn
Copy link

arvidn commented Jun 29, 2018

I spent some time on this (or a similar) problem at BitTorrent many years ago. We eventually gave up once we realized how hard the problem was. (specifically, we tried to pass around, via gossip, which swarms are the most popular. Since the full set of torrents is too large to pass around, we ended up with feedback loops because the ones that were considered popular early on got disproportional reach).

Anyway, one interesting aspect that we were aiming for was to create a "weighted" popularity, based on what your peers in the swarms you participated in thought was popular. in a sense, "what is popular in your cohort".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants