You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RSS is a good example of long tail. A few feeds update many times an hour, while many personal blogs update only a few times per month.
Miniflux today uses a round-robin scheduler. All feeds are fetched at the same frequency. This does not accommodate long tail of RSS feeds.
I propose the following flow, let miniflux to fetch feeds based on their updating frequency.
introduce three config flags / environment variables:
POLLING_SCHEDULER ->
"ROUND_ROBIN": The default scheduler
"INVERSE_COUNT": This scheduler sets the polling frequency based on the number of articles published in the previous week. This scheduler increases the polling frequency of more active feeds, while decrease the polling frequency of less active feeds. The maximum number of polling is still subject to "POLLING_FREQUENCY" and "BATCH_SIZE". If you have many feeds that do not update often, this scheduler will decrease the total number of polling, at a cost of larger latency of less active feeds.
If no valid value provided, the default scheduler "ROUND_ROBIN" will be used.
SCHEDULER_INVERSE_COUNT_MIN_INTERVAL -> default 5 minutes
SCHEDULER_INVERSE_COUNT_MAX_INTERVAL -> default 24 hours
update database scheme, add a new column "next_check_at" to the feed table. default to now()
In "func (s *Storage) NewBatch(batchSize int) (jobs model.JobList, err error)":
Query ordered by "next_check_at" instead of "last_checked_at",
and the "next_check_at" must be smaller than now(). i.e. it must be expired, not in the future.
The total number of feeds fetching is still subject to "POLLING_FREQUENCY" and "BATCH_SIZE"
in "func (h *Handler) RefreshFeed(userID, feedID int64)":
4.1. calling h.store.FeedByID(userID, feedID) returns the entries count in past 7 days, including "removed" items.
4.2.
If using "ROUND_ROBIN", the interval is always POLLING_FREQUNECY
If using "INVERSE_COUNT", calculate the average interval between two updates, based on the entries count and the flag in 1.
Set the "next_check_at" as now()+interval.
4.3. when calling h.store.UpdateFeed(originalFeed), update "next_check_at"
I can work on this but want to hear your opinion firstly.
The text was updated successfully, but these errors were encountered:
shizunge
changed the title
heuristic scheduler
alternative scheduler
Apr 22, 2020
For my own use, I added a hack to enforce a minimum duration between checks for rarely-updated feeds. pdewacht@1d8ed6d
I didn't submit a PR, because it's just too ugly and adhoc, but it works well for my purposes.
RSS is a good example of long tail. A few feeds update many times an hour, while many personal blogs update only a few times per month.
Miniflux today uses a round-robin scheduler. All feeds are fetched at the same frequency. This does not accommodate long tail of RSS feeds.
I propose the following flow, let miniflux to fetch feeds based on their updating frequency.
introduce three config flags / environment variables:
POLLING_SCHEDULER ->
"ROUND_ROBIN": The default scheduler
"INVERSE_COUNT": This scheduler sets the polling frequency based on the number of articles published in the previous week. This scheduler increases the polling frequency of more active feeds, while decrease the polling frequency of less active feeds. The maximum number of polling is still subject to "POLLING_FREQUENCY" and "BATCH_SIZE". If you have many feeds that do not update often, this scheduler will decrease the total number of polling, at a cost of larger latency of less active feeds.
If no valid value provided, the default scheduler "ROUND_ROBIN" will be used.
SCHEDULER_INVERSE_COUNT_MIN_INTERVAL -> default 5 minutes
SCHEDULER_INVERSE_COUNT_MAX_INTERVAL -> default 24 hours
update database scheme, add a new column "next_check_at" to the feed table. default to now()
In "func (s *Storage) NewBatch(batchSize int) (jobs model.JobList, err error)":
Query ordered by "next_check_at" instead of "last_checked_at",
and the "next_check_at" must be smaller than now(). i.e. it must be expired, not in the future.
The total number of feeds fetching is still subject to "POLLING_FREQUENCY" and "BATCH_SIZE"
4.1. calling h.store.FeedByID(userID, feedID) returns the entries count in past 7 days, including "removed" items.
4.2.
If using "ROUND_ROBIN", the interval is always POLLING_FREQUNECY
If using "INVERSE_COUNT", calculate the average interval between two updates, based on the entries count and the flag in 1.
Set the "next_check_at" as now()+interval.
4.3. when calling h.store.UpdateFeed(originalFeed), update "next_check_at"
I can work on this but want to hear your opinion firstly.
The text was updated successfully, but these errors were encountered: