Content popularity #3649

xoriole · 2018-05-28T13:38:22Z

Content Popularity

We have channels and torrents which comprise/represent content within Tribler. However, we do not yet have a mechanism to check if the content is popular, alive or even dead. This ticket is created as home for tracking development to address content popularity.
Parent issue #2783

The basic idea is to get a simple implementation operational first and incrementally build upon it, starting with torrent popularity.

Torrent popularity
Check a set of torrents (max: 25) using torrent checker and gossip the popular results to a set of connected peers (max:25) regularly at a fixed interval of time.

Channel Popularity
What are the indicators of a popular channel?

Swarm size of comprising torrents
Votes on the channel
...

These parameters can be used to derive a popularity index/score (very simple, initially) which can be used to rank the channels and disseminate the most popular channels to the peers regularly at a certain interval of time.

Issue to address: How to handle the conflicting response about the same torrent or channel from multiple peers?

ichorid · 2018-05-30T11:43:57Z

One way to handle the problem of conflicting response is to do away with absolute measures of popularity, and instead use relative measures of popularity.
For example:
Node1 makes popularity measurements for torrents A, B, C: A>B>C.
Node2 does the same for torrents I,J,K: I>J>K.
Node2 asks Node1 to provide some data on torrent popularity, and Node1 answers with "A>B>C".
Node2 now has two partial orderings: "I>J>K", "A>B>C". It does not know how A relates to K, so it does a minimal check itself, querying the number of seeds for both K and A. That allows it to establish the complete ordering "I>J>K>A>B>C" in the case "K>A", or use some efficient sorting algorithm with minimum number of comparisons to handle the case where "A>K".
This mechanism allows Node2 to "glue" the orderings together. Moreover, one could do that in "lazy" mode, only doing it for the subset of torrents that user queried in the current view (e.g. search filter, channel view, etc.)
The conflicts like "A>B vs B>A" could be resolved the same way, by checking conflicting info immediately, and deciding for ourselves what is true.

To put it another way, when we do "content popularity", essentially we want to sort torrents according to some function. There are sorting algorithms that perform very well if the array to sort is composed of big chunks of already sorted data. We treat data from other nodes as this "half-sorted" data, and finish the sort ourselves, trying to do as little checks ourselves as possible.

Essentially, we treat discrepancy in data provided to us by other peers as an error in sorting order that should be fixed by applying some sorting algorithm, with real-world data getting the last word.

xoriole · 2018-05-30T14:35:39Z

Thank you for the suggestion. It would nicely apply for sorting multiple responses for multiple torrents. However, the current issue is different in the sense that we are talking about multiple health responses (number of seeders and leechers) for a single torrent obtained from multiple peers.

The real question is whose response do we trust more to decide whether or not to update our local database. One approach in current consideration is to use 1) freshness of response and 2) trust score of sending peer from trustchain to make the decision.

ichorid · 2018-05-30T18:53:12Z

@xoriole, my suggestion solves this problem.
You don't want to get peer's opinion on a single torrent. You always get it for multiple torrents, in the form of an ordering. You don't ask for the number of seeders, you only get relative ranking.
Then you add it to your database, doing some sanity checks yourself. Like, check relation of the first and last torrent in the received ordering (by getting seeds yourself).

Whenever you see a conflict in orderings, you check it yourself. The only alternative is to use trust score. But even in the case of using the trust score, you'll have to do the check sometime, to catch cheaters and add your part of knowledge to the "swarm mind". The most conflicted opinions would get the most attention and re-checking.
BTW, we can't even establish the trust score precisely, because of high variance in seeds/trackers visibility. So, it's more about "opinion reliability" score.

ichorid · 2018-05-30T18:54:35Z

It's just like selective checks in a supermarket or in transport.

ichorid · 2018-06-01T10:07:16Z

Some cool slides on DHT exploration
http://static.usenix.org/events/woot10/tech/full_papers/Wolchok.pdf

synctext · 2018-06-28T11:57:10Z

You don't ask for the number of seeders, you only get relative ranking.

It usually better to ask for factual and verifiable data. Relative ranking are difficult to verify and catch a spammer in a lie. Exact swarm data is better in this case I believe.

arvidn · 2018-06-29T10:01:42Z

I spent some time on this (or a similar) problem at BitTorrent many years ago. We eventually gave up once we realized how hard the problem was. (specifically, we tried to pass around, via gossip, which swarms are the most popular. Since the full set of torrents is too large to pass around, we ended up with feedback loops because the ones that were considered popular early on got disproportional reach).

Anyway, one interesting aspect that we were aiming for was to create a "weighted" popularity, based on what your peers in the swarms you participated in thought was popular. in a sense, "what is popular in your cohort".

xoriole added the type: enhancement label May 28, 2018

xoriole added this to the V7.1: The token micro-economy milestone May 28, 2018

xoriole self-assigned this May 28, 2018

qstokkink closed this as completed Jun 29, 2018

synctext mentioned this issue Sep 11, 2018

content popularity community: performance evaluation #3868

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content popularity #3649

Content popularity #3649

xoriole commented May 28, 2018 •

edited

Loading

ichorid commented May 30, 2018 •

edited

Loading

xoriole commented May 30, 2018

ichorid commented May 30, 2018

ichorid commented May 30, 2018

ichorid commented Jun 1, 2018 •

edited

Loading

synctext commented Jun 28, 2018

arvidn commented Jun 29, 2018

Content popularity #3649

Content popularity #3649

Comments

xoriole commented May 28, 2018 • edited Loading

ichorid commented May 30, 2018 • edited Loading

xoriole commented May 30, 2018

ichorid commented May 30, 2018

ichorid commented May 30, 2018

ichorid commented Jun 1, 2018 • edited Loading

synctext commented Jun 28, 2018

arvidn commented Jun 29, 2018

xoriole commented May 28, 2018 •

edited

Loading

ichorid commented May 30, 2018 •

edited

Loading

ichorid commented Jun 1, 2018 •

edited

Loading