Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HWK]: Repronim instructor fellow training feedback #28

Open
6 tasks
gkiar opened this issue Apr 30, 2019 · 8 comments
Open
6 tasks

[HWK]: Repronim instructor fellow training feedback #28

gkiar opened this issue Apr 30, 2019 · 8 comments

Comments

@gkiar
Copy link

gkiar commented Apr 30, 2019

  • Page 0:
    • "reproducibility requires knowledge of what, when, and how" --> I think repetition or re-running requires this, replication or reproduction is a bit of another concept that can use bits of each of these pieces of information, but is broader and relies on much more (at least by the definitions I attribute to each). Would be good to clarify terms often; the paper I link in [HWK]: Repronim instructor fellow training feedback module-intro#6 is helpful here.
    • clarify intention: who is this for? True beginners, or people that know a bit? Is the goal just to have them know basic infrastructure they should use in this space, or understand why using the shell and git are important?
    • "very unlikely that you have managed to completely avoid using those tools" --> fix language; feels a bit alienating for those who truly are new to this
  • Page 1:
    • Far too large teaching-to-practicing ratio
    • Is it said anywhere why knowing how to use a shell is valuable for scientists? Where will it come in handy? Why can't they just use GUIs and Jupyter Notebooks for everything? (I know the answers to these, but don't think we say them)
    • In general, I'm very anti-duplicating content. Would it make sense to just open a PR with the pieces we want to add the SW carpentry lectures, and then this page would just 1) link to it, and 2) have short "practicals" in which we explain a setting we'd encounter in a typical scientific workflow and have them solve it (with hidden solutions available).
    • In places there is a lot of detail where I really don't think it's needed for the "basics"... i.e. LD_LIBRARY_PATH could be in a supplementary section, but it's not something that most people will need to pay much attention to, and certainly not before aliases or history will be relevant to them.
    • Would break this into several more digestible slides.
  • Page 2:
    • Again, far too large teaching-to-practice (TP) ratio, in my opinion
    • Similar comments to above, that it may be worth folding some bits into SWC lecture and leaving our pages to specific case studies that people will care about. Very regularly when teaching I would get a "but why are we using/learning this" unless I situate a tool specifically within a context where they recognize its immediate value. For git, the hilarious xkcd comic on filenames among others can help make these problems feel more real.
  • Page 3:
    • Decrease TP ratio
    • no real mention of pip? These are the standard for Python, and Conda really is an extra layer that's not a) shipped with Python, and b) necessary in many situations (like containers, which we'll get to later).
    • I also think virtualenv is valuable to mention in this context because you can decouple package managers and where they install their libraries; recognizing libraries as files on a system, you can re-point your package manager and environment to install, recognize, source, uninstall, or test against different versions of similar software.
  • Page 4:
    • Decrease teaching time; this doesn't require 3 hours if we're mostly covering it at a conceptual level
    • Add text elaborating on/summarizing the links in the text.
    • Link to choose a license
  • Page 5:
    • Have it be more "student led" and provide scaffolding rather than instructions (with hidden answers) --> "how can you identify if the problem is unique to you?" "what do developers need to know so that they can help you?" "if they need to reproduce it, what details do they need to do that?" "how can you record these details most easily when doing your work?" "how can you communicate these details?"
@yarikoptic
Copy link
Member

Hi @gkiar -- thanks a lot on the feedback, but I wonder what is the best way to proceed... may be we should at least split them into separate issues so to not breed one lengthy discussion on all of them at the same time?
Also, if you feel that there is an easy way to improve any aspect (e.g. re "Link to choose a license"), please feel more than welcome to submit a PR to make it happen.

@yarikoptic
Copy link
Member

With the above comment in mind here we can try to go through "Page 0" items:

  • "reproducibility requires knowledge of what, when, and how" --> I think repetition or re-running requires this, replication or reproduction is a bit of another concept that can use bits of each of these pieces of information, but is broader and relies on much more (at least by the definitions I attribute to each). Would be good to clarify terms often; the paper I link in ReproNim/module-intro#6 is helpful here.

FTR, the paper in question is Reproducibility vs. Replicability: A Brief History of a Confused Terminology . An interesting review, and as you can see from it -- there is still no consensus! IMHO the part of the problem is that in all of those cases we are trying to come up with a single word to cover some corner of the multi-dimensional space defined by at least following dimensions:

  • time/repetition -- that is where "repeatability" comes into place, where the other dimensions are 100% fixed (if you can guarantee that computing environment is the same... might be easier in case of regular physical measurement experiments). It seems also to primarily refer to ability to consistently collect yet another sample of data.
  • computing environment -- might differ slightly or widely; might encapsulate the "leaf" analysis scripts or not (and then we would need one more dimension to add)
  • researcher/team -- somewhat getting closer to what is typically referred by "reproducibility" or "results reproducibility" (given used computing environments are fixed)
  • study (specific experimental design) -- and here we come to what is typically called "replicability" or "inferential reproducibility"

Note that in ACM's description for "repeatability" it is the "Same team, same experimental setup" and there is even no term for "Same team, different experimental setup" -- what is it then? probably "Reproducibility". So, if you can guarantee that your setup is 100% the same -- we could start using term "repeatable", but I am not sure if that we would help more than confuse.

Similar to "Goodman et al. (2016)" I feel that "reproducibility" is a good generic term to use at any level or combination of variants and at any level, and it needs additional description to signify the level of reproducibility. You are right that typically "reproducibility" and "replicability" refers to study level results and in this section we are talking about very elementary, down to earth aspects which help to achieve reproducibility "from ground up". The idea is that if someone cannot reproduce what they have done a day or a week ago which produced their results (given the "measurement error" if any allowed) - how they could reproduce the entire study? If we teach students to become more efficient in repetitive tasks, managing computing environments, etc we would assist in taking control over those "lower" dimensions above.

So, "what, when, and how" to me is merely a colloquium way to say that we need to know details of the environment, study, analysis; although "when" might often be irrelevant unless you are (like I did many times) doing "data archaeological" expedition trying to figure out "how" things were actually done? knowing "when" assists in pointing that point in history/lab notebook/bash history/etc.

We could indeed refer to that paper here, but I think it might be better in the intro, against which you filed the recommendation already. If you feel that we could adjust wording in this section somewhat to make it more specific, please suggest how (PR).

Sorry, somehow it came out too long ;-)

@yarikoptic
Copy link
Member

  • clarify intention: who is this for? True beginners, or people that know a bit? Is the goal just to have them know basic infrastructure they should use in this space, or understand why using the shell and git are important?

Well, we have tried to answer "for whom" and "why" in the opening of http://www.repronim.org/module-reproducible-basics/00-Overview/

As for shell/git specifically -- those are argued for in corresponding sections. Again, if you see possibly how to improve, please suggest in a PR

@yarikoptic
Copy link
Member

  • "very unlikely that you have managed to completely avoid using those tools" --> fix language; feels a bit alienating for those who truly are new to this

For me it sounds like a perfect Russian with Chinese influence English -- we are all friends and not trying to alienate anyone (at least recently). If you feel a better wording would make us all cuddle even more -- please suggest it. Or I would make it even longer Russian English with some Finish influence (doing lots of saunas recently) and then not sure where it would lead us ;-)

@gkiar
Copy link
Author

gkiar commented Apr 30, 2019

@yarikoptic thanks so much for all the responses! I'll give a longer read later but wanted to acknowledge your responses :)

I did this review in prep for the instructor training ahead of HBM, so was planning to discuss changes then and make changes after that. Happy to do some before, of course!

@yarikoptic
Copy link
Member

If we fix it all up before - more time for gelato and wine will be left for us at ohbm ;-) I am yet to also see what parts/changes we need to pull in from our recent workshops, eg https://github.com/ReproNim/sfn2018-training/tree/gh-pages/_episodes

@gkiar
Copy link
Author

gkiar commented Apr 30, 2019

And regards to your long-response on reproducibility clarification - I agree with all of your points :)

I think it's important to squarely address that it is a confused terminology, though, rather than just accept that it's cloudy, and point out that words are overloaded and we mean _____ when we're talking about these things.

@gkiar
Copy link
Author

gkiar commented Apr 30, 2019

I will definitely open PRs for these various points - I apologize for not clarifying these were more notes for me than "tasks" for anybody else :) I didn't know of a better place to keep these notes than on the repo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants