An EventMachine+Ruby library to fetch urls obeying robots.txt rules.
RDaneel is built it on top of @igrigorik’s em-http-request
Support following redirects, honoring robots.txt for each host in the redirect chain.
Support an external cache to store robots.txt
Compatible with all options defined in em-http-request
$ gem install rdaneel
require 'rdaneel' do r ="") r.callback{ puts r.http_client.response_header.status puts r.http_client.response[0,80] puts r.redirects puts r.uri EM.stop } r.errback{ puts "should not happen" EM.stop } r.get(:redirects => 3) end => 200 => <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" => =>
require 'rdaneel' do r ="") r.callback{ puts "should not happen" EM.stop } r.errback{ puts r.error EM.stop } r.get(:redirects => 3) end => robots denied
R Daneel Olivaw is a fictional robot created by Isaac Asimov -
To Ilya Grigorik (@igrigorik) for em-http-request lib and his support and advice.
Fork the project.
Make your feature addition or bug fix.
Add tests for it. This is important so I don’t break it in a future version unintentionally.
Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
Send me a pull request. Bonus points for topic branches.
Copyright © 2010 has_many :developers. See LICENSE for details.