Skip to content

davidhin/singularity-ghtorrent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Download GHTorrent Data

This downloads GHTorrent data - specifically, commit messages and pull request comments. Instructions for running:

  1. Clone repo
  2. Build main.simg or pull from singularity hub.
  3. If local, run (where NUMBER HERE is from 1 to 200)
singularity run main.simg -p initialise
singularity run main.simg -p singghtorrent/analysis/main.py -a <NUMBER HERE>
  1. If on phoenix, run
sbatch hpc/download_job_array.sh

Format

  1. Raw data downloaded in storage/external/ghtorrent. Deleted when finished processing.
  2. Interrim data saved in storage/interim/ghtorrent. Deleted when finished processing.
  3. Final files stored in storage/processed/. Saved by day.

About

Use Phoenix HPC to download and process ghtorrent data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published