Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recompute analysis results in dockerized environment #24

Merged
merged 6 commits into from
Oct 12, 2023
Merged

Recompute analysis results in dockerized environment #24

merged 6 commits into from
Oct 12, 2023

Conversation

mih
Copy link
Contributor

@mih mih commented Oct 12, 2023

The creation of a containerized environment for the analysis became
necessary, because 3+ years after the "final" results have been computed
originally, it is getting difficult to recreate a matching computational
environment.

Even with pinned versions of essential software dependencies, issues
of incompatibilities with modern Python versions slowly arise.

The container setup used for this recomputation is the result of a
detailed exploration on the effect of software versions and deployment
methods. A reports is provided at
#20

Importantly, the employed setup is NOT capable of yielded exactly
identical results. While all statistical scores reported in the paper
remain indeed identical, there is a visually small change to one
histogram panel in Fig 4. The change is illustrated at
#20 (comment)

Given the overall state of reproducibility, and the anticipated
longevity of the containerized computation, we decided that this small
difference with respect to the journal publication is tolerable.

This changeset support a DataLad-based re-execution (for verification):

datalad rerun <commitsha>

After this changeset, a complete manuscript can be compiled, also
via DataLad via a:

datalad containers-run -n docker-make main.pdf

Closes #20

TODO:

  • Deposit the container image layers somewhere. They have been put on storage that should have some longevity. The respective URLs are public and registered with the annex keys for the two docker layers

mih added 6 commits October 11, 2023 19:32
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "sh -c 'rm -rf container/image; docker build -t remodnav:latest container && python -m datalad_container.adapters.docker save remodnav:latest container/image && echo '\"'\"'**/*json annex.largefiles=nothing\\nrepositories annex.largefiles=nothing\\n**/VERSION annex.largefiles=nothing'\"'\"' > container/image/.gitattributes'",
 "dsid": "c5a79271-7d24-42aa-a0cf-38d84fd15eaa",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [
  "container/Dockerfile"
 ],
 "outputs": [
  "container/image"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
This can be used with `containers-run` and normal "make" targets,
but from the Docker-Makefile, and with their execution actually
taking place inside the container. For example

```
datalad containers-run -n docker-make main.pdf
```
The creation of a containerized environment for the analysis became
necessary, because 3+ years after the "final" results have been computed
originally, it is getting difficult to recreate a matching computational
environment.

Even with pinned versions of essential software dependencies, issues
of incompatibilities with modern Python versions slowly arise.

The container setup used for this recomputation is the result of a
detailed exploration on the effect of software versions and deployment
methods. A reports is provided at
#20

Importantly, the employed setup is NOT capable of yielded exactly
identical results. While all statistical scores reported in the paper
remain indeed identical, there is a visually small change to one
histogram panel in Fig 4. The change is illustrated at
#20 (comment)

Given the overall state of reproducibility, and the anticipated
longevity of the containerized computation, we decided that this small
difference with respect to the journal publication is tolerable.

This changeset support a DataLad-based re-execution (for verification):

```
datalad rerun <commitsha>
```

After this changeset, a complete manuscript can be compiled, also
via DataLad via a:

```
datalad containers-run -n docker-make main.pdf
```

By default this uses the local Python installation via `python` to
orchestrate Docker. If python is available via a different name,
overide, for example, via:

```
datalad -c datalad.run.subsitutions.python=python3 rerun <commitsha>
```

Closes #20

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "{python} -m datalad_container.adapters.docker run container/image sh -c \"mkdir /tmp/dockertmp; HOME=/tmp/dockertmp make -f Docker-Makefile clean results_def.tex && rm -rf /tmp/dockertmp\"",
 "dsid": "c5a79271-7d24-42aa-a0cf-38d84fd15eaa",
 "exit": 0,
 "extra_inputs": [
  "container/image"
 ],
 "inputs": [
  "remodnav/remodnav/tests/data/anderson_etal",
  "data/studyforrest-data-eyemovementlabels/sub-*/*.tsv",
  "data/raw_eyegaze/sub-*/ses-movie/func/*_recording-eyegaze_physio.tsv.gz",
  "data/raw_eyegaze/sub-*/beh/*_recording-eyegaze_physio.tsv.gz"
 ],
 "outputs": [
  "img",
  "results_def.tex"
 ],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@mih mih merged commit 9197133 into master Oct 12, 2023
@mih mih deleted the docker branch October 12, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dockerizing the analysis
1 participant