Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poller should fill all available slots #173

Closed
Tarliton opened this issue Aug 23, 2016 · 5 comments
Closed

Poller should fill all available slots #173

Tarliton opened this issue Aug 23, 2016 · 5 comments

Comments

@Tarliton
Copy link

Today we can choose "poll_interval", "max_proc" and "max_proc_per_cpu". "max_proc" and "max_proc_per_cpu" are for limiting a maximum of jobs that can be running.

If "poll_interval" is high that maximum would never be reached. That happens because after each poll interval we start only one job.

E.g.: poll_interval = 30 and some spider takes about 2 minutes to finish. The maxium jobs that would run in this case is 4~5, no matter what "max_proc" and "max_proc_per_cpu" values are.

Maybe it should have an option "jobs_to_poll". With that we could choose how many jobs to poll each time, instead of only one.

What you guys think about that?

Thank you

@Digenis
Copy link
Member

Digenis commented Aug 24, 2016

I'd expect developers in this case to write spiders
that don't close but stay open to crawl on demand
but I do think that your concern is a valid.
However I'm not the appropriate person to look into it.
Someone more familiar with twisted can have a look.

@Digenis
Copy link
Member

Digenis commented Nov 2, 2016

Actually it makes sense for the poller
to poll for exactly as many spiders as the amount of free slots.
This should be easier if the project queues are unified in a single db(and/or table)
as I suggest in #187

@Digenis
Copy link
Member

Digenis commented Dec 7, 2016

By the way, you can set the poller to poll on a sub-second interval.
I add it in 10dfee0

@jpmckinney
Copy link
Contributor

jpmckinney commented May 13, 2022

Closing as duplicate of #187 and/or #197

Edit: Also, in this scenario, why not just poll more frequently? The problem seems to be the unusually high poll interval.

@jpmckinney jpmckinney changed the title Jobs to poll Poller should fill all available slots Jul 23, 2024
@jpmckinney
Copy link
Contributor

jpmckinney commented Jul 23, 2024

Re-opening as this is actually separatable from the other issues.

Poller.poll knows when all slots are full by checking dq.waiting, and it knows when it has no pending jobs to return by checking queue.count. So, we should be able to just add these checks into loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants