-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read-only filesystem errors from Snakemake in Singularity runtime #274
Labels
bug
Something isn't working
Comments
This was referenced Apr 25, 2023
tsibley
added a commit
to nextstrain/docker-base
that referenced
this issue
Apr 26, 2023
Provenance metadata was newly enabled by default in buildx 0.10.0¹ and entails switching from older, Docker image manifest lists (application/vnd.docker.distribution.manifest.list.v2+json) to the newer, but roughly equivalent-in-concept, OCI image indexes (application/vnd.oci.image.index.v1+json).² This switch happened automatically for us between these two builds: nextstrain/base:build-20230119T003940Z nextstrain/base:build-20230321T203820Z due to an upgrade of buildx from 0.9.1 → 0.10.0 on the GitHub Actions runners we use for CI. Unfortunately, Singularity doesn't support OCI image indexes (i.e. multi-platform images) until 3.6.0.³ While we'll likely need to require a newer Singularity version at some point anyway⁴, disabling the provenance metadata for now should restore compatibility with Singularity back to its 2.6 series. We don't need the provenance anyhow. ¹ <https://github.com/docker/buildx/releases/tag/v0.10.0> ² Provenance metadata is attached as additional entries in the image index alongside the normal entries for each platform image. ³ <nextstrain/cli#267> ⁴ <nextstrain/cli#274>
3 tasks
Maybe (or maybe not) related to this is that the |
2 tasks
I keep getting sidetracked on this. I'll pick it back up in a week or so, unless someone else gets to it. My extremely "for my eyes" notes on this: diff --git a/nextstrain/cli/resources/bashrc b/nextstrain/cli/resources/bashrc
index 3456696..83a83a4 100644
--- a/nextstrain/cli/resources/bashrc
+++ b/nextstrain/cli/resources/bashrc
@@ -6,8 +6,13 @@ fi
__NEXTSTRAIN_BASHRC=1
+OLDHOME="${HOME:-}"
+HOME=/nextstrain
+
# Override PS1, this file's reason for being.
OLDPS1="${PS1:-}"
+OLDPROMPT_COMMAND="${PROMPT_COMMAND:-}"
+unset PROMPT_COMMAND
if [[ -n "${NEXTSTRAIN_PS1:-}" ]]; then
PS1="$NEXTSTRAIN_PS1"
diff --git a/nextstrain/cli/runner/singularity.py b/nextstrain/cli/runner/singularity.py
index 9e7a671..3ba48cb 100644
--- a/nextstrain/cli/runner/singularity.py
+++ b/nextstrain/cli/runner/singularity.py
@@ -54,9 +54,15 @@ SINGULARITY_EXEC_ARGS = [
# in this area are not available on older versions.
#
# ¹ e.g. <https://docs.sylabs.io/guides/latest/user-guide/singularity_and_docker.html#docker-like-compat-flag>
- "--contain",
- "--no-home",
- "--cleanenv",
+ "--compat",
+
+ # XXX FIXME: --compat is 3.9.0
+ # XXX FIXME: --no-eval is 3.10.0 and included in --compat at that time
+ # 3.10.0 is a year old
+ # XXX FIXME: --writable-tmpfs is 3.0.0. we need this now
+ # XXX FIXME: do we still need --no-home? seems like no, not with --compat
+
+ # XXX FIXME: also need to adjust HOME to respect what the image sets… otherwise ends up weird
# Since we use --no-home above, avoid warnings about not being able to cd
# to $HOME (the default behaviour). run() will override this by specifying
@@ -121,6 +127,11 @@ def run(opts, argv, working_volume = None, extra_env = {}, cpus: int = None, mem
# Change the default working directory if requested
*(("--pwd", "/nextstrain/%s" % working_volume.name) if working_volume else ()),
+ # XXX FIXME: Apply limits.
+ # --cpus a la Docker <https://docs.sylabs.io/guides/latest/user-guide/cgroups.html#cpu-limits>
+ # --memory a la Docker <https://docs.sylabs.io/guides/latest/user-guide/cgroups.html#memory-limits>
+ # Requires Singularity 3.10 (--apply-cgroups can do the same thing with only 3.9, but version diff seems marginal)
+
str(image_path(image)),
*argv,
], extra_env) |
tsibley
added a commit
that referenced
this issue
May 25, 2023
I did this research as part of resolving a bug¹, so might as well document it where we'll need it. ¹ <#274>
tsibley
added a commit
that referenced
this issue
May 25, 2023
runner.singularity: Ensure HOME is taken from the image's default Singularity had been setting HOME to the user's HOME outside of the container which meant the image's default of /nextstrain wasn't getting used. This apparently wasn't an issue until we upgraded Snakemake¹ and it started expecting to be able to write files under HOME. HOME was practically guaranteed *not* to exist in the container because we disable mounting of the user's homedir with --no-home. We first observed this bug (alongside another, to be addressed next) in CI as: OSError: [Errno 30] Read-only file system: '/home/runner' The error comes from inside the container, but that homedir path comes from outside (on the CI runner). Since it isn't mounted due to --no-home, it's a non-existent path on the image filesystem. That could be ok since Snakemake will create the missing parts (without a boundary! orz), but it isn't ok in practice since the image filesystem is by default read-only (and the next bug to address). Note that even though our Docker image sets HOME=/nextstrain, the conversion to a Singularity image makes that a *default* value, e.g. HOME=${HOME:-/nextstrain}. It's unsurprising this semantic shift is a source of a subtle bugs! Resolves: <#274> ¹ <nextstrain/docker-base#136>
tsibley
added a commit
that referenced
this issue
May 25, 2023
I did this research as part of resolving a bug¹, so might as well document it where we'll need it. ¹ <#274>
tsibley
added a commit
that referenced
this issue
May 25, 2023
Our recent upgrade of Snakemake in our container image¹ resulted in an fatal error when used with the Singularity runtime, e.g.: OSError: [Errno 30] Read-only file system: '/home/runner' We first observed this in periodic CI for Nextstrain CLI, but were able to reproduce it outside CI. The error comes from Snakemake inside the container (newly) trying to `mkdir -p` paths under HOME and represents two separate-but-intertwined bugs: the first being the read-only container filesystem preventing creation under HOME, and the second that the value of HOME from outside the container (e.g. /home/runner in CI) is leaking into the container. The read-only filesystem previously wasn't an issue (and wasn't noticed) because writes were limited to paths bind-mounted read-write from outside the container (e.g. /nextstrain/build). Writing to the container filesystem has always been allowed in the Docker runtime, so we come to parity with it using --writable-tmpfs, which requires Singularity 3.0.0. The HOME leak arose thru a intersection of behaviours. Our container image sets HOME=/nextstrain², but this is a *default* value that can be overridden at container launch time. With our Docker runtime, it's never overridden. Singularity, however, automatically forwards the user's HOME value into the container, even though we use --no-home to disable mounting the user's home at the same path. The forwarded value thus override our image's default. Providing an explicitly empty HOME value via the --home option allows the image default to apply, as expected. Resolves: <#274> ¹ <nextstrain/docker-base#136> ² <https://github.com/nextstrain/docker-base/blob/1b0d1998/Dockerfile#L390>
tsibley
added a commit
that referenced
this issue
May 25, 2023
Our recent upgrade of Snakemake in our container image¹ resulted in an fatal error when used with the Singularity runtime, e.g.: OSError: [Errno 30] Read-only file system: '/home/runner' We first observed this in periodic CI for Nextstrain CLI, but were able to reproduce it outside CI. The error comes from Snakemake inside the container (newly) trying to `mkdir -p` paths under HOME and represents two separate-but-intertwined bugs: the first being the read-only container filesystem preventing creation under HOME, and the second that the value of HOME from outside the container (e.g. /home/runner in CI) is leaking into the container. The read-only filesystem previously wasn't an issue (and wasn't noticed) because writes were limited to paths bind-mounted read-write from outside the container (e.g. /nextstrain/build). Writing to the container filesystem has always been allowed in the Docker runtime, so we come to parity with it using --writable-tmpfs, which requires Singularity 3.0.0. The HOME leak arose thru a intersection of behaviours. Our container image sets HOME=/nextstrain², but this is a *default* value that can be overridden at container launch time. With our Docker runtime, it's never overridden. Singularity, however, automatically forwards the user's HOME value into the container, even though we use --no-home to disable mounting the user's home at the same path. The forwarded value thus override our image's default. Providing an explicitly empty HOME value via the --home option allows the image default to apply, as expected. Resolves: <#274> ¹ <nextstrain/docker-base#136> ² <https://github.com/nextstrain/docker-base/blob/1b0d1998/Dockerfile#L390>
tsibley
added a commit
that referenced
this issue
May 25, 2023
Our recent upgrade of Snakemake in our container image¹ resulted in an fatal error when used with the Singularity runtime, e.g.: OSError: [Errno 30] Read-only file system: '/home/runner' We first observed this in periodic CI for Nextstrain CLI, but were able to reproduce it outside CI. The error comes from Snakemake inside the container (newly) trying to `mkdir -p` paths under HOME and represents two separate-but-intertwined bugs: the first being the read-only container filesystem preventing creation under HOME, and the second that the value of HOME from outside the container (e.g. /home/runner in CI) is leaking into the container. The read-only filesystem previously wasn't an issue (and wasn't noticed) because writes were limited to paths bind-mounted read-write from outside the container (e.g. /nextstrain/build). Writing to the container filesystem has always been allowed in the Docker runtime, so we come to parity with it using --writable-tmpfs, which requires Singularity 3.0.0. The HOME leak arose thru a intersection of behaviours. Our container image sets HOME=/nextstrain², but this is a *default* value that can be overridden at container launch time. With our Docker runtime, it's never overridden. Singularity, however, automatically forwards the user's HOME value into the container, even though we use --no-home to disable mounting the user's home at the same path. The forwarded value thus override our image's default. Providing an explicitly empty HOME value via the --home option allows the image default to apply, as expected. Resolves: <#274> ¹ <nextstrain/docker-base#136> ² <https://github.com/nextstrain/docker-base/blob/1b0d1998/Dockerfile#L390>
github-project-automation
bot
moved this from Prioritized
to Done
in Nextstrain planning (archived)
May 26, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The Snakemake upgrade in nextstrain/docker-base#136 caused failures for builds in the Singularity runtime due to new filesystem writes that Snakemake now tries to make under
$HOME
. Currently with our Singularity runner,$HOME
and other non-working dirs are read-only.The failure showed up as scheduled daily CI in this repo starting to fail between
nextstrain/base:build-20230407T002437Z
(last good) andbuild-20230411T103027Z
(first bad).I think we can address this by using the
--writable-tmpfs
flag in the Singularity runner to allow a writable (but temporary and ultimately discarded) filesystem outside of the working dir. This is similar to Docker's default behaviour. There may be other considerations with the change though.The text was updated successfully, but these errors were encountered: