to get the intenational logistics information
rabbitmq,casperjs,multiprocessing,requests,python
1.distributed crawler
2.casperjs can get the webpage load with js
3.multiprocessing can use the Multi core CPU
The project can be used in openerp (database: postgresql) with xmlrpc first
To use in general, alter it with excel
#test on ubuntu 14.04
#casperjs 1.1.0 phantomjs 1.9.0
sudo apt-eget install phantomjs
$ git clone git://github.com/n1k0/casperjs.git
$ cd casperjs
$ ln -sf `pwd`/bin/casperjs /usr/local/bin/casperjs
sudo pip install xlrd,xlwt,requests
sudo apt-get install rabbitmq-server
1.python receive_data.py rabbitmq consumer,receive and do the data,can function on multi computer
2.python send_data.py rabbitmq producer,read the logistics number and put to list
3.python save_to_excel.py reand success_data_***.txt and save to track_result.xls
receive_data.py : rabbitmq consumer,receive and do the data,can function on multi computer
send_data.py : rabbitmq producer,read the logistics number and put to list
save_to_excel.py : reand success_data_***.txt and save to track_result.xls
my_rabbitmq.py : rabbitmq main function,send and receive data
do_track.py : use requests and casperjs to get the webpage,and parse, use multiprocessing
track_data.py : basic variable,read and write excel
my_process_pool.py : record multiprocessing exception
read_excel.py : read and write excel
track.js : capsperjs ues it to get webpage
rpc_api.py : xmlrpc api do with openerp
ftp_up.py : ftp up json file to php server
test_excel.py : test excel function
some_code.py : some code
sample_import.xls excel about logistics number
error_data_***.txt error message
success_data_***.txt success message
track_***.log track log
can use with the websiete,datebae:redis,mongodb and so on
import excel about logistics number and export excel about logistics detail