Pull data from DaRUS, the DAta Repository of the University of Stuttgart, to a local folder (./data
) with Python3.
Use it to organize the data of multiple datasets locally on your computer and to integrate your open data in git repositories.
-
Install Python3 + pip (Python package manager)
-
Clone this repository to the place you need it. If it is a git repository add it as submodule via
git submodule add https://github.com/iswunistuttgart/darus_data_download.git
-
Install required pyDataverse packet by
# cd to directory of this repository, then: pip install --user -r requirements.txt # or install directly: pip install --user pyDataverse
-
Create the configuration file (
scripts/config.txt
) template by runningpython scripts/get_data.py
-
If the dataset(s) you want to use are not (yet) public, then get your API Token on https://darus.uni-stuttgart.de/dataverseuser.xhtml?selectTab=apiTokenTab and fill it in a file named
.darus_apikey
. Warning: never check in your api_key via git! Within this repository it is added to .gitignore -
Configure the data to download in
scripts/darus_config.json
. The doi of each dataset is in the formatdoi:10.18419/darus-????
(find your own data on https://darus.uni-stuttgart.de/dataverseuser.xhtml?selectTab=dataRelatedToMe) -
If you are using this module as submodule: move the
darus_config.json
file to the directory above this repository and check it in with your parent git project to keep data configuration reproducible -
Download/update all data by running
python scripts/get_data.py
The metadata is also downloaded as as
info.json
in each folder
- in
./scripts/
(directory ofget_data.py
) - in
./
(the parent directory, whereReadme.md
is located) - in
../
(one directory above this project)
For downloading two datasets
{
"dataverse_url": "https://darus.uni-stuttgart.de/",
"datasets": [
"doi:10.18419/darus-1234",
"doi:10.18419/darus-1235",
]
}
- Handle ENV variables(especially for API key) to use it in Docker, etc.
- Make it more robust against failure/misconfiguration
- Allow upload of files (maybe use pyDaRUS)
- You are welcome to contribute bugfixes directly as pull requests
- For new features or changed functionality please open an issue first, or feel free to discuss it directly.