Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singularity compatibility fixes #283

Merged
merged 10 commits into from
May 26, 2023
Merged

Singularity compatibility fixes #283

merged 10 commits into from
May 26, 2023

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented May 25, 2023

See commit messages for details.

Resolves #274.

Testing

  • Testing locally on Singularity 3.5.3 and 3.11.1
  • Checks pass

tsibley added 3 commits May 24, 2023 11:05
Helpful bit of meta-information.
Invocations of these two workhorses of the codebase are often useful to
see during debugging and development.

Debug logging is still very primitive in this codebase.

Related-to: <#201>
I did this research as part of resolving a bug¹, so might as well
document it where we'll need it.

¹ <#274>
@tsibley tsibley requested a review from a team May 25, 2023 20:46
@tsibley
Copy link
Member Author

tsibley commented May 25, 2023

If CI passes for this PR I will probably merge it then, so that master's CI passes again.

@tsibley
Copy link
Member Author

tsibley commented May 25, 2023

Ok, two kinds of CI failures.

Type failures, which I should have checked locally. Oops.

CI-only failures, due to the CI system:

FATAL: container creation failed: while applying cgroups config: rootless cgroups require a D-Bus session - check that XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS are set

tsibley added 3 commits May 25, 2023 14:48
…k-setup

Helpful to explicitly document this and test for it.  (Though note it's
about to bump up in subsequent commits.)

We'll also be able to optionally use newer Singularity features by
guarding them with version checks.
Our recent upgrade of Snakemake in our container image¹ resulted in an
fatal error when used with the Singularity runtime, e.g.:

    OSError: [Errno 30] Read-only file system: '/home/runner'

We first observed this in periodic CI for Nextstrain CLI, but were able
to reproduce it outside CI.  The error comes from Snakemake inside the
container (newly) trying to `mkdir -p` paths under HOME and represents
two separate-but-intertwined bugs: the first being the read-only
container filesystem preventing creation under HOME, and the second that
the value of HOME from outside the container (e.g. /home/runner in CI)
is leaking into the container.

The read-only filesystem previously wasn't an issue (and wasn't noticed)
because writes were limited to paths bind-mounted read-write from
outside the container (e.g. /nextstrain/build).  Writing to the
container filesystem has always been allowed in the Docker runtime, so
we come to parity with it using --writable-tmpfs, which requires
Singularity 3.0.0.

The HOME leak arose thru a intersection of behaviours.  Our container
image sets HOME=/nextstrain², but this is a *default* value that can be
overridden at container launch time.  With our Docker runtime, it's
never overridden.  Singularity, however, automatically forwards the
user's HOME value into the container, even though we use --no-home to
disable mounting the user's home at the same path.  The forwarded value
thus override our image's default.  Providing an explicitly empty HOME
value via the --home option allows the image default to apply, as
expected.

Resolves: <#274>

¹ <nextstrain/docker-base#136>
² <https://github.com/nextstrain/docker-base/blob/1b0d1998/Dockerfile#L390>
…y ≥3.5.3

Hopefully this "arms race" is quiet for a while now…
@tsibley tsibley force-pushed the trs/singularity/compat branch from 1dc3575 to 5054c34 Compare May 25, 2023 22:21
@tsibley
Copy link
Member Author

tsibley commented May 25, 2023

Pyright failures are not specific to this branch, but fixed locally by 5054c34.

The mypy failures:

nextstrain/cli/runner/singularity.py:24: error: Name "docker" already defined (by an import)  [no-redef]
nextstrain/cli/runner/singularity.py:136: error: Cannot determine type of "singularity_version_at_least"  [has-type]
Found 2 errors in 1 file (checked 54 source files)                              

bisect to 019c8a8 and are currently inscrutable.

@tsibley
Copy link
Member Author

tsibley commented May 25, 2023

Ugh, mypy. This diff resolves its problems:

diff --git a/nextstrain/cli/runner/singularity.py b/nextstrain/cli/runner/singularity.py
index 28c3664..81634f3 100644
--- a/nextstrain/cli/runner/singularity.py
+++ b/nextstrain/cli/runner/singularity.py
@@ -66,7 +66,7 @@ SINGULARITY_CONFIG_ENV = {
     "SINGULARITYENV_PROMPT_COMMAND": "unset PROMPT_COMMAND; #",
 }
 
-SINGULARITY_EXEC_ARGS = lambda: [
+def SINGULARITY_EXEC_ARGS(): return [
     # Increase isolation.
     #
     # In the future, we may find we want to use additional related flags to

That's clearly an issue with mypy itself, but one we'll work around I guess.

@tsibley
Copy link
Member Author

tsibley commented May 25, 2023

Mypy issues fixed (locally at least) by 91c5f1c.

@tsibley tsibley force-pushed the trs/singularity/compat branch 2 times, most recently from e638cdd to 5bdef31 Compare May 25, 2023 23:30
@tsibley
Copy link
Member Author

tsibley commented May 25, 2023

5bdef31 got CI past Singularity's need for DBus when using cgroups v2 (for new --cpus and --memory support), but now it's raising a new error:

FATAL: container creation failed: while applying cgroups config: while setting cgroup limits: openat2 /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/singularity-5438.scope/cpu.max: no such file or directory

I'm going to drop cfbdda7 for now from this PR to unblock it and then circle back at some point in a separate PR.

tsibley added 3 commits May 25, 2023 16:35
…ine…

…when possible, which is only on Singularity 3.10.0 or newer.

Many uses will pass thru this too-early evaluation unscathed, so we
don't require it as 3.10.0 is too new a minimum version.  Without it,
however, some uses, like our reporting of component versions with a
`bash -c` invocation, will get maimed resulting in bugs that seem very
strange at first.
Without these type stubs, new failures related to these libraries are
raised by Pyright starting with 1.1.309:

    …/nextstrain/cli/remote/s3.py
      …/nextstrain/cli/remote/s3.py:213:39 - error: Cannot access member "Bucket" for type "_"
        Member "Bucket" is unknown (reportGeneralTypeIssues)
    …/nextstrain/cli/runner/aws_batch/s3.py
      …/nextstrain/cli/runner/aws_batch/s3.py:208:26 - error: "client" is not a known member of "None" (reportOptionalMemberAccess)
      …/nextstrain/cli/runner/aws_batch/s3.py:212:24 - error: Cannot access member "Bucket" for type "_"
        Member "Bucket" is unknown (reportGeneralTypeIssues)
    3 errors, 0 warnings, 0 informations

With these type stubs, those failures are resolved, but different
failures are raised:

    …/nextstrain/cli/remote/s3.py
      …/nextstrain/cli/remote/s3.py:251:23 - error: Could not access item in TypedDict
        "Error" is not a required key in "_ClientErrorResponseTypeDef", so access may result in runtime exception (reportTypedDictNotRequiredAccess)
      …/nextstrain/cli/remote/s3.py:251:23 - error: Could not access item in TypedDict
        "Code" is not a required key in "_ClientErrorResponseError", so access may result in runtime exception (reportTypedDictNotRequiredAccess)
    2 errors, 0 warnings, 0 informations

All failures are fixed in this commit.

Resolves: <#284>
With the lambda syntax, mypy raises these errors for currently-to-me
inscrutable reasons:

    nextstrain/cli/runner/singularity.py:24: error: Name "docker" already defined (by an import)  [no-redef]
    nextstrain/cli/runner/singularity.py:136: error: Cannot determine type of "singularity_version_at_least"  [has-type]
    Found 2 errors in 1 file (checked 54 source files)

The errors make no sense: the "docker" "redefinition" is the sole
definition, the import, and it is not redefined anywhere in our code
(but maybe it is in mypy's mangling of the code?), and the type of
"singularity_version_at_least" seems clear to me (but maybe mypy
can't/isn't forward looking here?).

Why this workaround works is unknown to me, but I guessed it would
perturb the code path in mypy enough and, lo, it does and it's
unoffensive enough to use.
@tsibley tsibley force-pushed the trs/singularity/compat branch from 5bdef31 to 31d708f Compare May 25, 2023 23:36
@tsibley
Copy link
Member Author

tsibley commented May 25, 2023

Moved those commits to 31d708f...trs/singularity/resource-limits.

@tsibley
Copy link
Member Author

tsibley commented May 26, 2023

Checks pass now, so going to merge to fix CI on master. I'll leave the support for resource limits to another time.

@tsibley tsibley force-pushed the trs/singularity/compat branch from a3b727b to 3215a5f Compare May 26, 2023 19:57
@tsibley tsibley merged commit 29b0569 into master May 26, 2023
@tsibley tsibley deleted the trs/singularity/compat branch May 26, 2023 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Read-only filesystem errors from Snakemake in Singularity runtime
1 participant