F1000Scraper

F1000Research is an open access publishing platform. It provides an API to extract XML or PDF of articles published in F1000Research. F1000Scraper is a python wrapper for scraping these articles as XML, and parsing the XML.

Usage

Collecting data using start and end date of the articles

Currently, the only functionality we provide within this wrapper is that of collecting data using the date option in the API. After downloading the files, you can simply run the program scrape.py from the api directory as follows:

python3 scrape.py <date_from> <date_to> <output_directory_path> <output_format> <keyword in the title (optional)>

whereas

data_from can be any date of the form "dd-mm-yyyy" or just "*" and defines the starting date.
data_to can be any date of the form "dd-mm-yyyy" or just "*" and defines the end date.
output_directory defines the path where the data files should be saved.
output_format needs to be either xml or pdf
keyword is an optional argument and will only download articles within the given date range where the provided keyword occurs in the title.

Example 1

python3 scrape.py 01-01-2019 01-01-2020 data/ xml

The above commmand will download articles in the XML format from 1st January 2019 to 1st January 2020, and save them to the data folder in the current directory.

Example 2

python3 scrape.py 01-01-2019 * data/ pdf

The above commmand will download articles in the PDF format from 1st January 2019 to today's date, and save them to the data folder in the current directory.

Disclaimer

This is a work in progress.

Contributors

Shahan Ali Memon
Bedoor AlShebli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

F1000Scraper

Usage

Collecting data using start and end date of the articles

Example 1

Example 2

Disclaimer

Contributors

Files

README.md

Latest commit

History

README.md

File metadata and controls

F1000Scraper

Usage

Collecting data using start and end date of the articles

Example 1

Example 2

Disclaimer

Contributors