check if a spider exists before schedule it (with sqlite cache) #17

xaqq · 2013-05-06T12:41:39Z

I pulled #8 to my local repo and added sqlite caching like pablohoffman suggested. I ran some performance tests using apache bench, and its ok.

…nto check-spider

jayzeng · 2014-07-04T23:00:19Z

scrapyd/utils.py

@@ -7,6 +8,28 @@
 from scrapy.utils.python import stringify_dict, unicode_to_str
 from scrapyd.config import Config

+class UtilsCache:


Can you add a class level doc to explain what it does?

jayzeng · 2014-07-04T23:02:47Z

looks good to me, @pablohoffman thoughts?

jayzeng · 2014-07-10T16:37:19Z

I will go ahead to pull this pull

check if a spider exists before schedule it (with sqlite cache)

pablohoffman · 2014-07-10T18:26:23Z

OK, but we should probably move this check to store data in whatever database scrapyd ends up using for persisting data.

UtilsCache.__init__ calls JsonSqliteDict(table="utils_cache_manager"), which uses ":memory:" as the database. A comment in #17 suggests persisting this cache. However, there isn no contract that egg storage must only be modified by Scrapyd. (For example, users can happily store eggs in the egg directory, before deploying Scrapyd.) Without persistence, there is really no reason to use SQLite. We can therefore use a simpler approach. This changes the get_spider_list function to a SpiderList class - Require the runner agument - Remove the pythonpath argument (unused) - Remove the config argument (see next commit) - Use get(), set() and delete() methods, instead of having to invalidate the cache with calls to UtilsCache - Evict only the specified version and default version on delversion.json, instead of all versions

Artem Bogomyagkov and others added 4 commits April 16, 2013 21:48

check if a spider exists before schedule it

a185ff2

Merge branch 'spider_check' of https://github.com/artem-dev/scrapyd i…

9b50eab

…nto check-spider

* Cache spider_list in Sqlite database

d8cd0a2

delete trailing whitespace

315dc63

pablohoffman mentioned this pull request May 28, 2013

check if a spider exists before schedule it #8

Closed

* Fix list cleanup

288afef

jayzeng reviewed Jul 4, 2014
View reviewed changes

jayzeng added a commit that referenced this pull request Jul 10, 2014

Merge pull request #17 from xaqq/check-spider

b9a38f6

check if a spider exists before schedule it (with sqlite cache)

jayzeng merged commit b9a38f6 into scrapy:master Jul 10, 2014

Digenis added the topic: scheduling label Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check if a spider exists before schedule it (with sqlite cache) #17

check if a spider exists before schedule it (with sqlite cache) #17

xaqq commented May 6, 2013

jayzeng Jul 4, 2014

jayzeng commented Jul 4, 2014

jayzeng commented Jul 10, 2014

pablohoffman commented Jul 10, 2014

check if a spider exists before schedule it (with sqlite cache) #17

check if a spider exists before schedule it (with sqlite cache) #17

Conversation

xaqq commented May 6, 2013

jayzeng Jul 4, 2014

Choose a reason for hiding this comment

jayzeng commented Jul 4, 2014

jayzeng commented Jul 10, 2014

pablohoffman commented Jul 10, 2014