Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asmacdo readthrough #64

Merged
merged 2 commits into from
Dec 18, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions publishing/article/results.tex
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ \subsection{Repository Structure}
The code unique to the reexecution framework consists of container image generation and container execution instructions, as well as a Make system for process coordination (\cref{fig:topology}).
This repository structure enhances the original reference article by directly linking the data at the repository level, as opposed to relying on its installation via a package manager.
Notably, however, the article source code itself is not duplicated or further edited here, but handled as a Git submodule, with all proposed improvements being recorded in the original upstream repository.
The layout constructed for this study thus provides robust provenance tracking and constitutes an instantiation of the YODA principle (a recursive acronym for “YODAs Organigram on Data Analysis” \cite{yoda}).
The layout constructed for this study thus provides robust provenance tracking and constitutes an instantiation of the YODA principles (a recursive acronym for “YODAs Organigram on Data Analysis” \cite{yoda}).

The Make system is structured into a top-level Makefile, which can be used for container image regeneration and upload, article reexecution in a containerized environment, and meta-article production.
There are two entry points for \emph{this}, and the original article, respectively — both of which are reexecutable (\cref{fig:workflow}).
Expand Down Expand Up @@ -57,7 +57,7 @@ \subsection{Resource Refinement}

As a notable step in our article reproduction effort, we have updated resources previously only available as tarballs (i.e. compressed \texttt{tar} archives), to DataLad.
This refinement affords both the possibility to cherry-pick only required data files from the data archive (as opposed to requiring a full archive download), as well as more fine-grained version tracking capabilities.
In particular, our work encompassed the re-write of the Mouse Brain Templates package \cite{mbt05} Make system.
In particular, our work encompassed a re-write of the Mouse Brain Templates package \cite{mbt05} Make system.
In its new release \cite{mbt10}, developed as part of this study, Mouse Brain Templates now publishes tarballs, as well as DataLad-accessible unarchived individual template files.


Expand Down Expand Up @@ -103,10 +103,10 @@ \subsubsection{Container image size should be kept small.}
Due to a lack of persistency, addressing issues in container images requires an often time-consuming rebuilding process.
One way to mitigate this is to make containers as small as possible.
In particular, when using containers, it is advisable to \textit{not} provide data via a package manager or via manual download inside the build script.
Instead, data provision should be handled outside of the container image and resources should be bind-mounted after download to a persistent location on the host machine.
Instead, data provisioning should be handled outside of the container image and resources should be bind-mounted after download to a persistent location on the host machine.

\subsubsection{Resources should be bundled into a superdataset.}
As external resources might change or disappear, it is beneficial to use data version control system, such as git-annex and DataLad.
As external resources might change, it is beneficial to use data version control system, such as git-annex and DataLad.
The git submodule mechanism permits bundling multiple repositories with clear provenance and versioning information, thus following the modularity principle promoted by YODA.
Moreover, git-annex supports multiple data sources and data integrity verification, thus increasing the reliability of a resource in view of providers potentially removing its availability.

Expand Down
Loading