GitHub - davidycliao/bisCrawler: An Automation Webcrawler for Extracting Central Bankers' Speeches

bisCrawler: An Automation Webcrawler for Extracting Central Bankers' Speeches 🛠️🧰

An automation web crawling framework for retrieving for Extracting Central Bankers' Speeches on the Website of Bank for International Settlements (https://www.bis.org) based on Selenium and Chrome browser.

Environment Setup

Need to install Anaconda Navigator and Python>=3.9 beforehand. And then, open the terminal and download bisCrawler repository by using git clone. About how to use git and Github, please have a look at this Tutorial for Beginners.

git clone  git@github.com:davidycliao/bisCrawler.git

Copy the commands below and paste them into the terminal:

# Change the directory by typing `cd` command once `bisCrawler` repository is downloaded.
cd bisCrawler

# Create the enviroment by using conda and name the enviroment `bisCrawler`.
conda create -n bisCrawler python=3.9

Instruction

Activate the pre-named enviroment. Alternatively, the environment for bisCrawler can be opened via Anaconda Navigator

conda activate bisCrawler

Install the dependencies from requirements.txt using pip methond.

pip install -r requirements.txt

Call bisCrawler Moduel

In the terminal:

# Note: you need to run it in the terminal where you activated the enviroment.
python bisCrawler.py

In Jupyter Notebook:

from bisCrawler import scraper

scraper()

When Running bisCrawler

When bisCrawler is running, you will be asked which page you would like to scrape (please, type any single digit from 1 to last page). Then bisCrawler will automatically generate pandas dataframe to restore the banker speeches and the urls to the textual document.

What bisCrawler Scrapes

This designed crawler automatically webscrapes the central bankers' speeches from the offical website, including a bunch of information with regards to each name of central banker, date and title and corresponding url to the textual document.

Websraped Data

The scraped dataframe will be stored as central_bank_speeches.csv in the bisCrawler folder.

Cite

Please cite this page if you use this toolkit for your research.

For example, with BibTeX:

@misc{bisCrawler,
    howpublished = {\url{https://github.com/davidycliao/bisCrawler}},
    title = {bisCrawler: An Automation Webcrawler for Extracting Central Bankers' Speeches},
    author = {David Yen-Chieh Liao and Li Tang},
    publisher = {GitHub},
    year = {2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
images		images
venv		venv
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
bisCrawler.py		bisCrawler.py
central_bank_speeches.csv		central_bank_speeches.csv
requirements.txt		requirements.txt
tutorial.ipynb		tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bisCrawler: An Automation Webcrawler for Extracting Central Bankers' Speeches 🛠️🧰

Environment Setup

Instruction

What bisCrawler Scrapes

Websraped Data

Cite

About

Contributors 2

Languages

License

davidycliao/bisCrawler

Folders and files

Latest commit

History

Repository files navigation

bisCrawler: An Automation Webcrawler for Extracting Central Bankers' Speeches 🛠️🧰

Environment Setup

Instruction

What bisCrawler Scrapes

Websraped Data

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages