This is a web scraper as part of our application for Zoohackathon 2019 in Helsinki. You can find the full application and further information at https://devpost.com/software/li-l-sebastian.
- The json file input/plants.json contains the names and ids of the CITES listed plants we would like to find further information on
- The python file crawler/final_crawler.py is my final version of the crawler I use to scrape with
- Run it by being in the folder
crawler
an type into the terminal (requirements are found below)
python final_crawler.py
To help reduce the overall demand of trafficked plants, this tool helps to identify sellers and offers of CITES listed plant species through popular e-commerce sites, to reduce the amount of manual labour on the part of law enforcement individuals and CITES staff.
We used a scraper to identify the relevant postings on alibaba, which then fed into a json file. The data was then used to populate the Progressive Web App (PWA) built with ReactJS.
Requires python 3. Python packages required are
- urllib
- lxml
- pandas
- json
- pickle