New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于微博抓取的线程数选择的疑问 #22

Open

huntzhan opened this issue Jul 31, 2014 · 0 comments

huntzhan commented Jul 31, 2014

你好，感谢你提供了这样的一个框架，It helps a lot。

我注意到你把微博抓取的instances设置为2，且由于

# cola/worker/loader.py
if master is None:
    with StandaloneWorkerJobLoader(job, root, force=force) as job_loader:
        job_loader.run()

，全局只有2个线程在抓取微博。

我在做类似爬虫的时候触发了新浪的反爬虫机制，造成每次登录必须输入验证码的情况，原因估计是并发抓取的线程数太多（16个）。于是想问下你这个线程数是怎么得出来的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment