Revise the `PopularityCommunity` metadata retrieval protocol. #7632

drew2a · 2023-10-16T08:12:54Z

Despite the protocol's apparent simplicity, PopularityCommunity is quite complex as it derives logic from RemoteQueryCommunity:

tribler/src/tribler/core/components/popularity/community/popularity_community.py

Lines 22 to 23 in 20fb224

 class PopularityCommunity(RemoteQueryCommunity, VersionCommunityMixin): 

 """

This inheritance was implemented in #5736

The current algorithm for metadata retrieval is as follows:
#7398 (comment)

If a peer receives torrent health info for a torrent whose metadata is missing, the Popularity Community subsequently requests the missing metadata.

tribler/src/tribler/core/components/popularity/community/popularity_community.py

Lines 81 to 92 in 20fb224

 async def on_torrents_health(self, peer, payload): 

 self.logger.debug(f"Received torrent health information for " 

 f"{len(payload.torrents_checked)} popular torrents and" 

 f" {len(payload.random_torrents)} random torrents") 

 health_tuples = payload.random_torrents + payload.torrents_checked 

 health_list = [HealthInfo(infohash, last_check=last_check, seeders=seeders, leechers=leechers) 

 for infohash, seeders, leechers, last_check in health_tuples] 

 for infohash in await run_threaded(self.mds.db, self.process_torrents_health, health_list): 

 # Get a single result per infohash to avoid duplicates 

 self.send_remote_select(peer=peer, infohash=infohash, last=1)

As RemoteQueryCommunity is going to be removed in 8.0.0 we have to replace the algorithm for metadata retrieval.

The text was updated successfully, but these errors were encountered:

drew2a · 2023-10-16T12:23:13Z

As a starting point for discussion, I propose the following algorithm:

Popularity Community operates in this manner:

Upon introduction requests, send information about popular torrents.
Every 5 seconds, choose a random peer and send a torrent health request.
The chosen peer responds with a list of health information.

(Note: Steps 1, 2, and 3 remain unchanged from the current algorithm)

The requester identifies torrents for which knowledge is missing and sends a series of messages requesting this knowledge:

@dataclass
class RequestKnowledgeMessage:
    infohash: str

The chosen peer responds with a series of messages containing the required knowledge.

tribler/src/tribler/core/components/knowledge/community/knowledge_payload.py

Lines 42 to 45 in 20fb224

 @dataclass(msg_id=STATEMENT_OPERATION_MESSAGE_ID) 

 class StatementOperationMessage: 

 operation: StatementOperation 

 signature: StatementOperationSignature

synctext · 2023-10-25T08:39:50Z

My idea is to first focus on stability, removing Gigachannels, keeping tags, and then radically alter the architecture. Lets not try to fix things which are not broken currently 🤔

Remove 2 out of 3 methods for content discovery
- promote PopularityCommunity as the only way to discover novel hashes
- remove channel sampling/pre-view and free-for-all channel mechanism
- Remove in GUI and core at some time or hide
Release a stable release with this code
Add a new message inside the PopularityCommunity
- backwards compatible with older peers
- new feature of shadow keys and Libtorrent ground truth on swarm size
- Query, swarm-clicked, swarm-not-clicked, swarm-clicked-size-as-seen-by-Libtorrent, date, shadow-signature
crawl new info
- Web-of-trust: also the rendezvous will get start producing limited crawl data
- New privacy-protected ClickLog-based discovery
New release which starts to utilise the new "ContentDiscovery" community and one-struct-to-rule-them-all
Further releases (mixing 4 things all into 1 Tribler hopefully 🙏 )
- collecting further data for the Machine Learning Science part
- collecting further data for web-of-trust
- collecting further data for tag-based metadata enrichment (content,trust, and queries)
- end of gigachannels

drew2a · 2023-10-30T11:48:26Z

Removing 2 out of 3 Content Discovery Methods

During my effort on Friday to eliminate channel sampling/pre-view and the free-for-all channel mechanism, I encountered some obstacles. Even though my initial attempt wasn't successful, I've gained insights into how this can be achieved and can now provide more detailed estimations.

The removal process should begin on the GUI side. This involves:

Removing visual elements associated with these features.
Deleting the corresponding models and widgets.

Once the GUI components are addressed:

The Gigachannel Manager can be deleted.
The Remote Query Community should be detached from the Metadata Store, along with its metadata.db part.
Any parts not related to the search can then be stripped from the Gigachannel Community.
Finally, the Gigachannel Community can be renamed to Search Community, reflecting its sole focus on search.

Following these changes, a majority of the channels will be eliminated. Any remnants can either be adapted or removed in future refactoring stages.

From my current understanding, it can take 1 week to process these steps.

qstokkink · 2023-11-03T11:05:43Z

* Add a new message inside the PopularityCommunity
  
  * backwards compatible with older peers
  * new feature of shadow keys and Libtorrent ground truth on swarm size
  * `Query, swarm-clicked, swarm-not-clicked, swarm-clicked-size-as-seen-by-Libtorrent, date, shadow-signature`

For this little part of the master plan I have the following implementation in mind:

[GUI] Store the last search query in the search widget and its associated top-X (top-10?? -> needs to fit in a UDP packet) results by infohash.
[GUI] Whenever a user starts downloading a new torrent with an infohash in the previously-stored results, wait until the download is at least 50% completed and fetch the info tuple (see quoted reply above) to the core.
[CORE] Store the tuples in a/the database and also store a reverse mapping for each torrent (i.e., for a given infohash X store infohash Y that is likely preferrable or X itself).
[CORE] Instead of gossiping random torrents, gossip the more preferrable torrent Y for a randomly sampled torrent X (using the O(1) reverse mapping in the db). Here we use the "shadow identity" instead of a user's real identity and/or pick a received signed record signed by another shadow identity and gossip that.

Once that all works the last remaining step is to update the search results to also make use of the preference relation instead of pure db-based text search.

qstokkink · 2023-12-14T11:01:15Z

A more detailed design (green blocks include the code to add), capturing some insights since my last post:

Changes:

Because of our endpoint structure, we don't need to touch GUI code - just the endpoints.
Because this new functionality interfaces with different components, it needs to be in a component itself (named UserActivityComponent above).
It is easier to start with the torrent_finished_alert in the first version, instead of waiting for a 50% threshold.

Disclaimer: this is still before writing even a single line of code, the design may change as I implement it.

qstokkink · 2024-08-19T13:55:25Z

This has now been implemented.

drew2a added the type: enhancement label Oct 16, 2023

drew2a added this to the 8.0.0 milestone Oct 16, 2023

drew2a assigned xoriole and drew2a Oct 16, 2023

synctext mentioned this issue Nov 3, 2023

The big migration: from the Channels to the Knowledge Graph #7398

Closed

qstokkink mentioned this issue Dec 11, 2023

Channels Retirement #7669

Closed

13 tasks

drew2a removed their assignment Dec 11, 2023

drew2a removed this from the 8.0.0 milestone Dec 11, 2023

qstokkink mentioned this issue Dec 21, 2023

WIP: Store preferable infohashes for queries #7786

Closed

xoriole removed their assignment Feb 13, 2024

qstokkink closed this as completed Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise the `PopularityCommunity` metadata retrieval protocol. #7632

Revise the `PopularityCommunity` metadata retrieval protocol. #7632

drew2a commented Oct 16, 2023

drew2a commented Oct 16, 2023

synctext commented Oct 25, 2023 •

edited

Loading

drew2a commented Oct 30, 2023

qstokkink commented Nov 3, 2023

qstokkink commented Dec 14, 2023

qstokkink commented Aug 19, 2024

Revise the PopularityCommunity metadata retrieval protocol. #7632

Revise the PopularityCommunity metadata retrieval protocol. #7632

Comments

drew2a commented Oct 16, 2023

drew2a commented Oct 16, 2023

synctext commented Oct 25, 2023 • edited Loading

drew2a commented Oct 30, 2023

qstokkink commented Nov 3, 2023

qstokkink commented Dec 14, 2023

qstokkink commented Aug 19, 2024

Revise the `PopularityCommunity` metadata retrieval protocol. #7632

Revise the `PopularityCommunity` metadata retrieval protocol. #7632

synctext commented Oct 25, 2023 •

edited

Loading