Skip to content

Commit

Permalink
Prioritise resources with a state of NEW when dequeing (#31)
Browse files Browse the repository at this point in the history
  • Loading branch information
nevali committed Nov 23, 2016
1 parent d9f164e commit 5aa2054
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion crawler/queues/db.c
Original file line number Diff line number Diff line change
Expand Up @@ -633,7 +633,7 @@ db_next_txn(SQL *db, void *userdata)
" \"root\".\"hash\" = \"res\".\"root\" AND "
" \"root\".\"earliest_update\" < NOW() AND "
" \"res\".\"next_fetch\" < NOW() "
" ORDER BY \"root\".\"earliest_update\" ASC, \"res\".\"next_fetch\" ASC, \"root\".\"rate\" ASC",
" ORDER BY \"res\".\"state\" = 'NEW' DESC, \"root\".\"earliest_update\" ASC, \"res\".\"next_fetch\" ASC, \"root\".\"rate\" ASC",
me->ncrawlers, me->crawler_id);
if(!rs)
{
Expand Down

3 comments on commit 5aa2054

@rjpwork
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this always prioritise NEW resources? Is that the desired behaviour (as opposed to turning it on for N hours or some other limited scope)?

@rjpwork
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also conflicts with implementing #66

@nevali
Copy link
Member Author

@nevali nevali commented on 5aa2054 Nov 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the time being, always prioritising NEW resources is desirable; implementing #66 is then a minor configuration-based conditional to adjust the query, which is the preferred longer-term solution.

for now, it's not often that we care deeply about freshness — the vast majority of the data we’re processing changes very rarely.

Please sign in to comment.