To update the data, make sure you follow the steps below.
Make sure you have a working environment with R and python 3 installed. We recommend R >= 4.0.2 and Python >= 3.7.
You can check:
$ python --version
and
$ R --version
In your environment (shell), run:
$ pip install -r requirements.txt
In your R console, run:
install.packages(c("data.table", "googledrive", "googlesheets4", "httr", "imputeTS", "lubridate", "pdftools", "retry",
"rjson", "rvest", "stringr", "tidyr", "rio", "plyr", "bit64"))
Note: pdftools
requires poppler
. In MacOS, run brew install poppler
.
Create a file testing_dataset_config.json
with all required parameters:
{
"google_credentials_email": "[OWID_MAIL]",
"covid_time_series_gsheet": "[COVID_TS_GSHEET]",
"attempted_countries_ghseet": "[COUNTRIES_GSHEET]",
"audit_gsheet": "[AUDIT_GSHEET]",
"owid_cloud_table_post": "[OWID_TABLE_POST]"
}
$ python3 run_python_scripts.py [option]
$ Rscript run_r_scripts.R [option]
Note: Accepted values for option
are: "quick" and "update". VM runs this everyday with "quick" option. Manual
execution with mode "update" is required ~twice a week (e.g. tuesday and friday).
Run generate_dataset.R
. Usage of RStudio is recommended.
Run
$ python ../megafile.py
Note: May want to use test_update.sh.template
as a reference.