diff --git a/publishing/article/results.tex b/publishing/article/results.tex index d2add2c1..87beca17 100644 --- a/publishing/article/results.tex +++ b/publishing/article/results.tex @@ -14,7 +14,7 @@ \subsection{Repository Structure} The code unique to the reexecution framework consists of container image generation and container execution instructions, as well as a Make system for process coordination (\cref{fig:topology}). This repository structure enhances the original reference article by directly linking the data at the repository level, as opposed to relying on its installation via a package manager. Notably, however, the article source code itself is not duplicated or further edited here, but handled as a Git submodule, with all proposed improvements being recorded in the original upstream repository. -The layout constructed for this study thus provides robust provenance tracking and constitutes an instantiation of the YODA principle (a recursive acronym for “YODAs Organigram on Data Analysis” \cite{yoda}). +The layout constructed for this study thus provides robust provenance tracking and constitutes an instantiation of the YODA principles (a recursive acronym for “YODAs Organigram on Data Analysis” \cite{yoda}). The Make system is structured into a top-level Makefile, which can be used for container image regeneration and upload, article reexecution in a containerized environment, and meta-article production. There are two entry points for \emph{this}, and the original article, respectively — both of which are reexecutable (\cref{fig:workflow}). @@ -57,7 +57,7 @@ \subsection{Resource Refinement} As a notable step in our article reproduction effort, we have updated resources previously only available as tarballs (i.e. compressed \texttt{tar} archives), to DataLad. This refinement affords both the possibility to cherry-pick only required data files from the data archive (as opposed to requiring a full archive download), as well as more fine-grained version tracking capabilities. -In particular, our work encompassed the re-write of the Mouse Brain Templates package \cite{mbt05} Make system. +In particular, our work encompassed a re-write of the Mouse Brain Templates package \cite{mbt05} Make system. In its new release \cite{mbt10}, developed as part of this study, Mouse Brain Templates now publishes tarballs, as well as DataLad-accessible unarchived individual template files. @@ -103,10 +103,10 @@ \subsubsection{Container image size should be kept small.} Due to a lack of persistency, addressing issues in container images requires an often time-consuming rebuilding process. One way to mitigate this is to make containers as small as possible. In particular, when using containers, it is advisable to \textit{not} provide data via a package manager or via manual download inside the build script. -Instead, data provision should be handled outside of the container image and resources should be bind-mounted after download to a persistent location on the host machine. +Instead, data provisioning should be handled outside of the container image and resources should be bind-mounted after download to a persistent location on the host machine. \subsubsection{Resources should be bundled into a superdataset.} -As external resources might change or disappear, it is beneficial to use data version control system, such as git-annex and DataLad. +As external resources might change, it is beneficial to use data version control system, such as git-annex and DataLad. The git submodule mechanism permits bundling multiple repositories with clear provenance and versioning information, thus following the modularity principle promoted by YODA. Moreover, git-annex supports multiple data sources and data integrity verification, thus increasing the reliability of a resource in view of providers potentially removing its availability.