You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When checking if a page is allowed and an exception is thrown, a traceback is printed to the screen but not rethrown. The allowed() function simply returns False, making it hard to differentiate between disallowed sites and offline sites.
fromreppy.cacheimportRobotsCacherob=RobotsCache(capacity=100)
# localhost:1234 is not bound, its mean to error# ConnectionException is thrown. it is printed but not rethrownout=rob.allowed("http://localhost:1234", "bot")
print(out) # returns False
This is similar to #110 as there is only a True or False output, but here an exception is not being thrown.
I would put in a pull request, but I am having issues understanding where/what to do to test the code I need. My suggestion is to have an optional rethrow=False param here:
Great question! And it's one I had to think back a bit to remember how we manage this. I'll take you on a quick tour, but there's an easy change to accomplish what it sounds like you want.
So, the the Robots class which can do all the fetching, parsing, etc., allows for users to pass through any arguments to fetch that they'd pass to requests, so a lot of behavior can be controlled there. The Cache classes have even more bells and whistles, primarily driven by policies in reppy.cache.policy. At the moment, there are two implemented - 1) one that returns a default object (and for RobotsCache the default cache policy returns a Robots object that returns False for every URL if an exception is raised: https://github.com/seomoz/reppy/blob/master/reppy/cache/__init__.py#L79), and 2) one that reraises the exception. The first might be useful in situations where we just want to answer the question: "are we for sure allowed to fetch this URL", and anything other than successfully sussing that out would mean 'no.'
But if you have code that would like to deal with the nuance of different types of errors, you absolutely can! And you could extend the CachePolicyBase to implement whatever behavior you'd like if the built-in ones don't suit your needs. But if you are just looking to have exceptions raised:
fromreppy.cacheimportRobotsCachefromreppy.cache.policyimportReraiseExceptionPolicyrob=RobotsCache(capacity=100, cache_policy=ReraiseExceptionPolicy(60))
# localhost:1234 is not bound, its mean to error# ConnectionException is actually thrown nowout=rob.allowed("http://localhost:1234", "bot")
print(out) # name is not defined
When checking if a page is allowed and an exception is thrown, a traceback is printed to the screen but not rethrown. The
allowed()
function simply returnsFalse
, making it hard to differentiate between disallowed sites and offline sites.This is similar to #110 as there is only a
True
orFalse
output, but here an exception is not being thrown.I would put in a pull request, but I am having issues understanding where/what to do to test the code I need. My suggestion is to have an optional
rethrow=False
param here:reppy/reppy/cache/__init__.py
Line 81 in 2554d8d
This allows for backwards compatibility with the ability to rethrow instead of returning (if desired)
The text was updated successfully, but these errors were encountered: