Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read-only filesystem errors from Snakemake in Singularity runtime #274

Closed
tsibley opened this issue Apr 25, 2023 · 2 comments · Fixed by #283
Closed

Read-only filesystem errors from Snakemake in Singularity runtime #274

tsibley opened this issue Apr 25, 2023 · 2 comments · Fixed by #283
Assignees
Labels
bug Something isn't working

Comments

@tsibley
Copy link
Member

tsibley commented Apr 25, 2023

The Snakemake upgrade in nextstrain/docker-base#136 caused failures for builds in the Singularity runtime due to new filesystem writes that Snakemake now tries to make under $HOME. Currently with our Singularity runner, $HOME and other non-working dirs are read-only.

The failure showed up as scheduled daily CI in this repo starting to fail between nextstrain/base:build-20230407T002437Z (last good) and build-20230411T103027Z (first bad).

I think we can address this by using the --writable-tmpfs flag in the Singularity runner to allow a writable (but temporary and ultimately discarded) filesystem outside of the working dir. This is similar to Docker's default behaviour. There may be other considerations with the change though.

@tsibley tsibley added the bug Something isn't working label Apr 25, 2023
@tsibley tsibley self-assigned this Apr 25, 2023
tsibley added a commit to nextstrain/docker-base that referenced this issue Apr 26, 2023
Provenance metadata was newly enabled by default in buildx 0.10.0¹ and
entails switching from older, Docker image manifest lists
(application/vnd.docker.distribution.manifest.list.v2+json) to the
newer, but roughly equivalent-in-concept, OCI image indexes
(application/vnd.oci.image.index.v1+json).²

This switch happened automatically for us between these two builds:

    nextstrain/base:build-20230119T003940Z
    nextstrain/base:build-20230321T203820Z

due to an upgrade of buildx from 0.9.1 → 0.10.0 on the GitHub Actions
runners we use for CI.

Unfortunately, Singularity doesn't support OCI image indexes (i.e.
multi-platform images) until 3.6.0.³  While we'll likely need to require
a newer Singularity version at some point anyway⁴, disabling the
provenance metadata for now should restore compatibility with
Singularity back to its 2.6 series.  We don't need the provenance
anyhow.

¹ <https://github.com/docker/buildx/releases/tag/v0.10.0>

² Provenance metadata is attached as additional entries in the image
  index alongside the normal entries for each platform image.

³ <nextstrain/cli#267>
⁴ <nextstrain/cli#274>
@tsibley
Copy link
Member Author

tsibley commented Apr 26, 2023

Maybe (or maybe not) related to this is that the nextstrain shell prompt also no longer shows up for the Singularity runtime. Instead, we get the default Singularity> shell prompt. But this might be Singularity version dependent also.

@tsibley
Copy link
Member Author

tsibley commented May 12, 2023

I keep getting sidetracked on this. I'll pick it back up in a week or so, unless someone else gets to it. My extremely "for my eyes" notes on this:

diff --git a/nextstrain/cli/resources/bashrc b/nextstrain/cli/resources/bashrc
index 3456696..83a83a4 100644
--- a/nextstrain/cli/resources/bashrc
+++ b/nextstrain/cli/resources/bashrc
@@ -6,8 +6,13 @@ fi
 
 __NEXTSTRAIN_BASHRC=1
 
+OLDHOME="${HOME:-}"
+HOME=/nextstrain
+
 # Override PS1, this file's reason for being.
 OLDPS1="${PS1:-}"
+OLDPROMPT_COMMAND="${PROMPT_COMMAND:-}"
+unset PROMPT_COMMAND
 
 if [[ -n "${NEXTSTRAIN_PS1:-}" ]]; then
     PS1="$NEXTSTRAIN_PS1"
diff --git a/nextstrain/cli/runner/singularity.py b/nextstrain/cli/runner/singularity.py
index 9e7a671..3ba48cb 100644
--- a/nextstrain/cli/runner/singularity.py
+++ b/nextstrain/cli/runner/singularity.py
@@ -54,9 +54,15 @@ SINGULARITY_EXEC_ARGS = [
     # in this area are not available on older versions.
     #
     # ¹ e.g. <https://docs.sylabs.io/guides/latest/user-guide/singularity_and_docker.html#docker-like-compat-flag>
-    "--contain",
-    "--no-home",
-    "--cleanenv",
+    "--compat",
+
+    # XXX FIXME: --compat is 3.9.0
+    # XXX FIXME: --no-eval is 3.10.0 and included in --compat at that time
+    # 3.10.0 is a year old
+    # XXX FIXME: --writable-tmpfs is 3.0.0. we need this now
+    # XXX FIXME: do we still need --no-home? seems like no, not with --compat
+
+    # XXX FIXME: also need to adjust HOME to respect what the image sets… otherwise ends up weird
 
     # Since we use --no-home above, avoid warnings about not being able to cd
     # to $HOME (the default behaviour).  run() will override this by specifying
@@ -121,6 +127,11 @@ def run(opts, argv, working_volume = None, extra_env = {}, cpus: int = None, mem
         # Change the default working directory if requested
         *(("--pwd", "/nextstrain/%s" % working_volume.name) if working_volume else ()),
 
+        # XXX FIXME: Apply limits.
+        #   --cpus a la Docker <https://docs.sylabs.io/guides/latest/user-guide/cgroups.html#cpu-limits>
+        #   --memory a la Docker <https://docs.sylabs.io/guides/latest/user-guide/cgroups.html#memory-limits>
+        # Requires Singularity 3.10 (--apply-cgroups can do the same thing with only 3.9, but version diff seems marginal)
+
         str(image_path(image)),
         *argv,
     ], extra_env)

@joverlee521 joverlee521 mentioned this issue May 16, 2023
1 task
tsibley added a commit that referenced this issue May 25, 2023
I did this research as part of resolving a bug¹, so might as well
document it where we'll need it.

¹ <#274>
tsibley added a commit that referenced this issue May 25, 2023
runner.singularity: Ensure HOME is taken from the image's default

Singularity had been setting HOME to the user's HOME outside of the
container which meant the image's default of /nextstrain wasn't getting
used.  This apparently wasn't an issue until we upgraded Snakemake¹ and
it started expecting to be able to write files under HOME.  HOME was
practically guaranteed *not* to exist in the container because we
disable mounting of the user's homedir with --no-home.

We first observed this bug (alongside another, to be addressed next) in
CI as:

    OSError: [Errno 30] Read-only file system: '/home/runner'

The error comes from inside the container, but that homedir path comes
from outside (on the CI runner).  Since it isn't mounted due to
--no-home, it's a non-existent path on the image filesystem.  That could
be ok since Snakemake will create the missing parts (without a boundary!
orz), but it isn't ok in practice since the image filesystem is by
default read-only (and the next bug to address).

Note that even though our Docker image sets HOME=/nextstrain, the
conversion to a Singularity image makes that a *default* value, e.g.
HOME=${HOME:-/nextstrain}.  It's unsurprising this semantic shift is a
source of a subtle bugs!

Resolves: <#274>

¹ <nextstrain/docker-base#136>
tsibley added a commit that referenced this issue May 25, 2023
I did this research as part of resolving a bug¹, so might as well
document it where we'll need it.

¹ <#274>
tsibley added a commit that referenced this issue May 25, 2023
Our recent upgrade of Snakemake in our container image¹ resulted in an
fatal error when used with the Singularity runtime, e.g.:

    OSError: [Errno 30] Read-only file system: '/home/runner'

We first observed this in periodic CI for Nextstrain CLI, but were able
to reproduce it outside CI.  The error comes from Snakemake inside the
container (newly) trying to `mkdir -p` paths under HOME and represents
two separate-but-intertwined bugs: the first being the read-only
container filesystem preventing creation under HOME, and the second that
the value of HOME from outside the container (e.g. /home/runner in CI)
is leaking into the container.

The read-only filesystem previously wasn't an issue (and wasn't noticed)
because writes were limited to paths bind-mounted read-write from
outside the container (e.g. /nextstrain/build).  Writing to the
container filesystem has always been allowed in the Docker runtime, so
we come to parity with it using --writable-tmpfs, which requires
Singularity 3.0.0.

The HOME leak arose thru a intersection of behaviours.  Our container
image sets HOME=/nextstrain², but this is a *default* value that can be
overridden at container launch time.  With our Docker runtime, it's
never overridden.  Singularity, however, automatically forwards the
user's HOME value into the container, even though we use --no-home to
disable mounting the user's home at the same path.  The forwarded value
thus override our image's default.  Providing an explicitly empty HOME
value via the --home option allows the image default to apply, as
expected.

Resolves: <#274>

¹ <nextstrain/docker-base#136>
² <https://github.com/nextstrain/docker-base/blob/1b0d1998/Dockerfile#L390>
tsibley added a commit that referenced this issue May 25, 2023
Our recent upgrade of Snakemake in our container image¹ resulted in an
fatal error when used with the Singularity runtime, e.g.:

    OSError: [Errno 30] Read-only file system: '/home/runner'

We first observed this in periodic CI for Nextstrain CLI, but were able
to reproduce it outside CI.  The error comes from Snakemake inside the
container (newly) trying to `mkdir -p` paths under HOME and represents
two separate-but-intertwined bugs: the first being the read-only
container filesystem preventing creation under HOME, and the second that
the value of HOME from outside the container (e.g. /home/runner in CI)
is leaking into the container.

The read-only filesystem previously wasn't an issue (and wasn't noticed)
because writes were limited to paths bind-mounted read-write from
outside the container (e.g. /nextstrain/build).  Writing to the
container filesystem has always been allowed in the Docker runtime, so
we come to parity with it using --writable-tmpfs, which requires
Singularity 3.0.0.

The HOME leak arose thru a intersection of behaviours.  Our container
image sets HOME=/nextstrain², but this is a *default* value that can be
overridden at container launch time.  With our Docker runtime, it's
never overridden.  Singularity, however, automatically forwards the
user's HOME value into the container, even though we use --no-home to
disable mounting the user's home at the same path.  The forwarded value
thus override our image's default.  Providing an explicitly empty HOME
value via the --home option allows the image default to apply, as
expected.

Resolves: <#274>

¹ <nextstrain/docker-base#136>
² <https://github.com/nextstrain/docker-base/blob/1b0d1998/Dockerfile#L390>
tsibley added a commit that referenced this issue May 25, 2023
Our recent upgrade of Snakemake in our container image¹ resulted in an
fatal error when used with the Singularity runtime, e.g.:

    OSError: [Errno 30] Read-only file system: '/home/runner'

We first observed this in periodic CI for Nextstrain CLI, but were able
to reproduce it outside CI.  The error comes from Snakemake inside the
container (newly) trying to `mkdir -p` paths under HOME and represents
two separate-but-intertwined bugs: the first being the read-only
container filesystem preventing creation under HOME, and the second that
the value of HOME from outside the container (e.g. /home/runner in CI)
is leaking into the container.

The read-only filesystem previously wasn't an issue (and wasn't noticed)
because writes were limited to paths bind-mounted read-write from
outside the container (e.g. /nextstrain/build).  Writing to the
container filesystem has always been allowed in the Docker runtime, so
we come to parity with it using --writable-tmpfs, which requires
Singularity 3.0.0.

The HOME leak arose thru a intersection of behaviours.  Our container
image sets HOME=/nextstrain², but this is a *default* value that can be
overridden at container launch time.  With our Docker runtime, it's
never overridden.  Singularity, however, automatically forwards the
user's HOME value into the container, even though we use --no-home to
disable mounting the user's home at the same path.  The forwarded value
thus override our image's default.  Providing an explicitly empty HOME
value via the --home option allows the image default to apply, as
expected.

Resolves: <#274>

¹ <nextstrain/docker-base#136>
² <https://github.com/nextstrain/docker-base/blob/1b0d1998/Dockerfile#L390>
@github-project-automation github-project-automation bot moved this from Prioritized to Done in Nextstrain planning (archived) May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant