Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tag, branch and/or release code for reproducibility #108

Open
jowagner opened this issue May 24, 2022 · 3 comments
Open

tag, branch and/or release code for reproducibility #108

jowagner opened this issue May 24, 2022 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation next step This issue should be addresses in Summer 2022 reproducibility Improve transparency what was done

Comments

@jowagner
Copy link
Collaborator

jowagner commented May 24, 2022

While people who want to replicate our paper can check out code based on a commit number found by inspecting the commit history, they are at risk to pick

  • a too early commit: they would miss late additions (code used in the experiment but forgotten to add to the repo at the time of use)
  • a too late commit: they would use code with features or bug fixes that were not used in the paper

We should tag, branch and/or release code to make it easy for visitors to pick the right code for reproducibility.

A branch would make it possible to keep updating the README (and to make late additions of code used in the experiments) even after the main branch diverges, e.g. when the main branch changes the steps and/or tools to carry out the experiment. This branch could be named "bert-base-irish-cased-v1", matching the model name in the huggingface model repository.

We also need to document commit number / version of wiki-bert-pipeline and opusfilter. (The idea of using a fork in your own github account only works as long as you remember to never hit the "fetch upstream" button or to make any other changes to your fork.)

@jowagner jowagner added documentation Improvements or additions to documentation reproducibility Improve transparency what was done next step This issue should be addresses in Summer 2022 labels May 24, 2022
@jbrry
Copy link
Owner

jbrry commented Aug 2, 2022

Thanks - it might be safe to do a release before any new code is added, and we can also make a branch as well just in case we want to update anything that works with the old functionality. Is there any specific commit we should release/branch from? Looking at the recent commits, most commits since 2022 seem to be fairly cosmetic and shouldn't change things too much. In that case, should we make our release/branch from the most recent commit?

Good advice about keeping track of the relevant commits used for the external libraries, opusfilter and wikibert-pipeline. I will do that too.

@jowagner
Copy link
Collaborator Author

jowagner commented Aug 3, 2022

I see no functional changes this year in 9b45a4d...master

Branching from head should be ok and is easiest. If you prefer to branch from an earlier commit you probably will want to cherry pick all commits updating the readme.

@jbrry
Copy link
Owner

jbrry commented Aug 3, 2022

Thanks, I agree.

I made releases from the relevant branches of our forks of:

I updated the README 5be6457 with instructions to download these releases specifically, so users will have a snapshot of these external libraries that won't be affected by upstream merges.

These releases/dependencies form the basis of the v0.1.0 release of Irish-BERT: https://github.com/jbrry/Irish-BERT/releases/tag/v0.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation next step This issue should be addresses in Summer 2022 reproducibility Improve transparency what was done
Projects
None yet
Development

No branches or pull requests

2 participants