-
Notifications
You must be signed in to change notification settings - Fork 2
Releases
A release is an export of all or part of a repository that we want to be able to access for some downstream use. Examples include:
- Draft of a paper for posting or submission
- Slide deck for a presentation
- Cleaned data files to be used in other repositories or projects
- Intermediate data files to be used in the current repository that we want to maintain in a stable and replicable state
Releases should include the files we intend to use downstream as well as sufficient information to reproduce those files. That typically means recording the commit at which the released files were produced and/or the state of the full repository at that time.
When the released files are PDFs of papers, talks, etc. we create releases on Github. When the released files are data or other large files we create releases on Dropbox.
A release on Github consists of a tag to a specific commit, a zip archive of the repository (excluding any large files handled by LFS), and additional binary files that may be attached by hand. To create a Github release:
- Navigate to the Github releases section in a web browser
- Choose "Draft a new release"
- Choose a descriptive title (e.g., "Econometrica Second Submission 10_2020") and a descriptive short tag (e.g., "Ecma2ndSubmission")
- Attach the released files (e.g., PDF of a paper or talk) as binaries
A release on Dropbox consists of an export of a file, directory, or entire repository. We save these within the releases/
directory of the GS Lab Dropbox in a subdirectory per project, with each release being stored using the day the release was made's date--which can be cross-referenced from GitHub. In case there are multiple releases made in one day, this will automatically append a letter to the name to indicate this.
Our preferred tool for creating these releases is rclone
. The following code can be helpful in how to execute this in practice. Please edit it to ensure that RCLONE_DROPBOX
is accurate to your rclone
setup, PROJECT_NAME
is your project's GitHub name, and EXT_NAME
is either full_repo
if it is the whole repository or an informative name such as cleaned_data
. This script should be run knowing the current working directory will be be copied in full. If you only want to copy certain files, or types of files, consult this wiki on rclone filtering. Additionally, this script will also create a .txt
file with key information about the current state of the repo on git.
URL=$(git remote get-url origin)
BRANCH_NAME=$(git rev-parse --abbrev-ref HEAD)
HASH_CODE=$(git rev-parse HEAD)
[ -e release_readme.md ] && rm release_readme.md
echo "## URL\n$URL\n## Branch\n$BRANCH_NAME\n## Hash\n$HASH_CODE" >> release_readme.md
DATE=$(date +%Y_%m_%d)
PROJECT_NAME=<YOUR PROJECT NAME>
EXT_NAME=<NAME OF SUBDIRECTORY>
RCLONE_DROPBOX=dropbox
if rclone lsf dropbox:release/$PROJECT_NAME/$EXT_NAME/$DATE
then
for letter in {b..z}; do
if rclone lsf dropbox:release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter
then
continue
else
echo "Copying to: release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter"
rclone mkdir $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter
rclone copy --skip-links ./ $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE"_"$letter
echo "Done copying"
break
fi
done
else
echo "Copying to: release/$PROJECT_NAME/$EXT_NAME/$DATE"
rclone mkdir $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE
rclone copy --skip-links ./ $RCLONE_DROPBOX\:release/$PROJECT_NAME/$EXT_NAME/$DATE
echo "Done copying"
fi
rm release_readme.md
If you do not have rclone
set up, check the section below for instructions.
For many, but not all, releases we may want to have an associated replication package that can be posted or distributed online that contains all of the necessary content to create the relevant paper and slides. These should not contain intermediate outputs or unnecessary code. This creation process can be done locally--or in a branch to allow for collaboration.
- Remove all files and repositories besides
code/
,external.txt
,input.txt
, andmake.py
. The only exception is thatpaper_slides/output
should still contain the final paper and slides. - Ensure that all comments within the LyX documents are removed.
- Delete any unused code files and remove them from the corresponding
make.py
. - Replace any git submodules (e.g., gslab_make) with directories committed directly to the repository
- Ensure that there is a well-documented README of how to obtain any relevant data outside of what we have permission to share.
- Move the repo as it currently stands to a new repo and delete the git history by removing
.git/
. Leave.gitignore
and.gitattributes
within the repo. - Initialize a new git history by using the command
git init
. - Zip the folder.
- Run the entire repo using the command
python run_all.py
to ensure that it is indeed possible to create the appropriate output with this repo. If this does not run succesfully, use the zipped version of the repo to make changes since this will have the appropriate intermediate files already removed. Return to (7). - Attach the zipped folder as a binary to your release.
If the repository incorporates data stored on Dropbox, ensure that for any re-run the outside data is in the proper state. Dropbox has a "rewind" feature, so you can go to the relevant repositories and choose to restore the state of the repository as it was on the date that the release was made. These changes affect all users of the Dropbox folder so be sure to revert to the most up to date status after use.
In certain cases, the release should also have a link to a corresponding data snapshot on Dropbox. The snapshot is a folder gslab_data_snapshots/[name-of-repo]-[release-commit-#]
. The folder should contain the version copy of the repository corresponding to the release (repo
) and a copy of the raw folders from Dropbox used on the repo (data_dropbox
). Ideally, the folders and files should be compressed at the highest level possible, and each .zip
file should not contain more than 50Gb of uncompressed data. A lab member with Team admin access should add the folder to gslab_data_snapshots
using the Admin console, and confirm all sub-folders only have view-only access to users.
In the case where a project needs the output from another repository, we recommend using stable output stored in the releases
folder. In the case that the two projects must be worked on side by side rsync
is useful and can be added to make
files to ensure data is up to date before each release.
On your personal machine, run the command brew install rclone
if you do not already have it downloaded. On Sherlock, make sure that module load rclone
is in your ./bash_profile
. In either case, configure Dropbox using these instructions. Recommended names are dropbox
and gsbox
.
The documentation from Dropbox here is quite helpful.
- Practice Task
- Autofilling Values
- Overleaf Workflow
- IT Support
- Research Clusters
- Legacy Tools
- Style Guides
- Mothballing Projects
- Recruiting on Social Media
- PhD Applications
- Gentzkow-Shapiro Lab Notes
- Allcott-Gentzkow Lab Notes