Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tracker statistics importation #530

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Mar 12, 2024

Currently, the Index imports statistics for all torrents every hour (1 hour is the default value in the configuration). We need to import stats for all torrents because we allow users to sort torrents by torrent stats (number of seeders and leechers). This PR improves a little bit the process.

  • Add a new field (updated_at) to the table torrust_torrent_tracker_stats with the datetime when the stats were imported from the tracker. This is for logging purposes but it also helps to import torrents in batches. Regarding logging, it could help to check that the cronjob is running correctly.
  • We get all torrents (get_all_torrents_compact) from the database. That could be big array of infohashes. We could obtain the 50 records that have not been updated for the longest time and run the importation every 100 milliseconds. We request the tracker API every 100 milliseconds getting 50 torrents. Those values can be adjusted in the future.
  • A new filter was added to the tracker API to get statistics for a list of torrents with one request. We can use it instead of getting one torrent at a time.

Pros:

  • With millions of torrents we don't need to load all of them into memory.
  • The new field updated_at helps to monitor the importation process.
  • We get torrent stats for 50 torrents in one request instead of one request per torrent.

Cons:

  • Every 100 milliseconds we run a query to check which torrent stats are pending to update.

Sorry, something went wrong.

Add a new column to the table `torrust_torrent_tracker_stats` with the
datetime of the update date, which means the date when the stats
(leechers and seeders) were synchronized from the tracker API.
Instead of importing all torrents statistics from the tracker every
hour, this change imports 50 torrents every 100 milliseconds.

Batches only include torrents that have not been updated in the last
hour (or whatever is the `torrent_info_update_interval` value in the
configuration).

This change avoid loading the whole set of torrents in memory every time
the importation starts.

In the future, It could also allow to handle the nunber of request per
second to the tracker (statically, from config value or dinamically,
depending on the tracler load).

Althought it saves memory, it runs more SQL queries to get the list of
torrents pending to update.
@josecelano josecelano force-pushed the 469-import-all-torrents-statistics-from-the-tracker-even-with-high-load-and-many-torrents branch from a1e8ae9 to 16cbea8 Compare March 13, 2024 12:47
…ker API endpoint

There is a new feature inthe tracker API where you can get stats for a
list of infohashes:

<http://localhost:1212/api/v1/torrents?token=MyAccessToken&info_hash=f584cba7dd4008ecc026ac2dc0ce1ad179822f5f&info_hash=f59caeaf12e7bc8a289c39b698d085bc27eec1c2&info_hash=f6465cb6bd227e7c97d1de7cb426551af97eae41&info_hash=f655996ba112da8d0835463e6be4f47ff0bfef0c&info_hash=f68b7d6296d3e933b455c6107badf8dc6eeccadc&info_hash=f6a4eec77008786d91c344716ed2bb58570cdbd6&info_hash=f6ae9710af2d09faf8d337855e441087e2ff9286&info_hash=f6f89b0a54f3944f36027ff38ec950781e654836&info_hash=f77cfff1ab500a203c73141f98947acd7b5d0686&info_hash=f792faf15179d7c01fbf4647e96c28b155810f90&info_hash=f7c895711191b602211bc267fc0468c302f6974d&info_hash=f7d4589f96974ec030a798f943a82ecfdeb2f013&info_hash=f7ec8c6963cefcaf4c1b322358ac5d9edfb5b8b6&info_hash=f7fc543c48f1535692efa8e623e738bc67997eab&info_hash=f80ba0b3ad573a403e16d7b3d7c17863676f8f1f&info_hash=f871d2a6d41b30c4caa6255c653e1f02cd8996c1&info_hash=f894f06b6d0411f28d5906177103354db3f8340d&info_hash=f89b08ae4a4af5d1327b31bb1a6ed2f9b3d227b4&info_hash=f913667273b8562ec30366f8ba32e7e4a2f65742&info_hash=f9d5713cdf9539f1feffae05c04cfdbbcaea18a8&info_hash=f9eb982706d058dc855cc9a7528048631fff3d33&info_hash=f9fb61ad5aadf585dd86cb63e5bcc6dfed71f6fc&info_hash=faae957e9a3d7f9fd11074b3a49ce6dfd8d1c75b&info_hash=fb08e03e518fb7d5ae6ff73af3854d3e75a6b228&info_hash=fb25d2c0a0a109d90db1459547c926c8fd32f888&info_hash=fb6a3274e36bd2b4f5e3833308e57a0d7eb1cc27&info_hash=fb765107b4029569009003eeb4c87a5707612807&info_hash=fb994291a47627fa3b84849709965aa9bf781f58&info_hash=fba2da365997d3aab086cc2998274051f5a3cc8c&info_hash=fba992473ac2b1760fcda77b9877eeb4e48e4990&info_hash=fbdeb27908830e438eafd1a3f84a114ff0f428bf&info_hash=fbee94aeda72de1035ef8ee2dca861c722d5cf26&info_hash=fc82989b9f718f2fc3cb8487d1fe4ced411f9630&info_hash=fc89b80f119bc6ae91e7263b3f21db55b3fd16ad&info_hash=fcb85658c7ca1a82b5cc563af8165c4d20aa2d9e&info_hash=fcf40cb66b0bb72c9e478f07957a1ee9d140ce75&info_hash=fcf57886742d297d2017b2f83fa69ec8814a0d3d&info_hash=fd0bf9d869d2886a370f81838f978c2c26da4222&info_hash=fd3a4be495bd64a7e2ba4dc8b78eed1f8958f644&info_hash=fe0a1913ad2a1dfa8ddc93e02217c5d8ef384306&info_hash=fe28c9463c50d8febc1b7757553c05b725b42879&info_hash=fe7089ca13b7b218f4af8e98303cf1fbaacc90eb&info_hash=fea586402fccca172470715aa3558978d952799d&info_hash=fee8409338d889ee130dcee19bda84deee72da65&info_hash=ff01f0bf22e5f8483b8e82b2bc88c9c536a76bfa&info_hash=ff52b816d9bad366c2ed1232efd0711b3e262f92&info_hash=ff589afca896eb04bacf245e3c041e6feb54ab05&info_hash=ff6a1c9c60c16ff96115ee95a814017b5c1709a8&info_hash=ffe46c2247e844804adff54d770fe274d2d2e873&info_hash=fff0f6bb2eaae8b2e0e163d1acddd8ff2e4dec7e>

This way you can get many torrents stasts in one request.

This commit replaces the statistics importer to use this new endpoint
feature.
@josecelano
Copy link
Member Author

ACK af7150b

@josecelano josecelano marked this pull request as ready for review March 13, 2024 16:38
@josecelano josecelano merged commit 1368045 into torrust:develop Mar 13, 2024
10 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Import all torrents statistics from the tracker (even with high load and many torrents)
1 participant