Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beat singularity-tools up to shape #177908

Open
1 of 10 tasks
SomeoneSerge opened this issue Jun 16, 2022 · 35 comments
Open
1 of 10 tasks

Beat singularity-tools up to shape #177908

SomeoneSerge opened this issue Jun 16, 2022 · 35 comments
Labels
0.kind: enhancement Add something new 2. status: backlog This is a low priority

Comments

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Jun 16, 2022

Issue description

I intend to start using nixpkgs' singularity-tools for hpc applications.
What follows is a list of hindrances and minor annoyances that I've immediately encountered.
The list is mostly for myself: I'm opening the issue to make this visible and maybe motivate people to voice ideas and comments.
Cf. this read on singularity with Nix for more inspiration

  • VM-free image builds: Beat singularity-tools up to shape #177908 (comment)

  • Singularity needs patching to make images reproducible: singularity-tools.buildImage: non-deterministic #279250

    • mkfs generates random UUIDs
    • SIF metadata (information about the builder, timestamps)
  • Give users control over contents, in particular allow to remove bash: currently including bash manually results in singularity-tools.buildImage throwing obscure errors

  • Annoyance: we can compute diskSize from the built contents instead of choosing an arbitrary constant

  • Hindrance: failing to pack any cuda-enabled dependencies. The error says: ... Cannot allocate memory. My /tmp is on disk, and I don't seem to be running out of RAM, so this message might be just another version of "not enough space left on (squashfs) device"

  • Hindrance: buildImage interface doesn't expose

    • Environment variables (or any other metadata, like apphelp)
    • Services ("container instances")
  • ...

  • Get this merged: singularity: fix defaultPath and reflect upstream changes #158486

CC (possibly interested) @ShamrockLee @jbedo

@ShamrockLee
Copy link
Contributor

I would probably have more spare time after June 21.

To speed up the merging of #158486, I would prefer limiting the scope of this PR to:

  1. Support multiple sources and non-vendored source build while
  2. Not to break the previous functionality of singularity, singularity.nix and singularity-tools.

Further improvements can be done in successive PRs.

@jbedo
Copy link
Contributor

jbedo commented Jun 18, 2022

I agree with limiting the scope of the PR, I'll have time to help in a couple of weeks.

@ShamrockLee
Copy link
Contributor

[ ] Annoyance: we can compute diskSize from the built contents instead of choosing an arbitrary constant

Is there a way to compute diskSize from contents at eval time with no IFD?

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Jun 20, 2022

Regarding the singularity-tools, a significant problem is the closure size being doubled unnecessarily by mkLayer.

singularity-tools.mkLayer generates all a new derivation by copying all the files and directories of each package into "$out", and then we use writeReferencesToFile to get the list of derivations in the dependency tree of the generated layer package.

Why don't we get a list of references of all the packages directly?

Here's my implementation which merges the writeReferencesToFiles result of all the packages in the list while removing the duplication. There should be a better implementation than the O(n^2) duplication removal, but it's much faster than the O(n) content copying of mkLayer anyway.

{
  writeMultipleReferencesToFile = paths: runCommand "runtime-deps-multiple" {
    referencesFiles = map writeReferencesToFile paths;
  } ''
    touch "$out"
    declare -a paths=();
    for refFile in $referencesFiles; do
      while read path; do
        isPathIncluded=0
        for pathIncluded in "''${paths[@]}"; do
          if [[ "$path" == "$pathIncluded" ]]; then
            isPathIncluded=1
            break
          fi
        done
        if (( ! isPathIncluded )); then
          echo "$path" >> "$out"
          paths+=( "$path" )
        fi
      done < "$refFile"
    done
  '';
}

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Jun 20, 2022

Is there a way to compute diskSize from contents at eval time with no IFD?

I cannot say what's a good way to compute it, but the trivial baseline is a derivation that takes, in buildInputs, a buildEnv with to-be image's contents, and dus it. The output times some constant is an upperbound on the diskSize

EDIT: i.e. we wouldn't know diskSize at nix eval time, but we'd know it at build time, which appears to be sufficient

@ShamrockLee
Copy link
Contributor

EDIT: i.e. we wouldn't know diskSize at nix eval time, but we'd know it at build time, which appears to be sufficient

Then we would no longer be able to use

vmTools.runInLinuxVM (
  runCommand {
    preVM = vmTools.createEmptyImage {
      size = diskSize;
      fullName = "${projectName}-run-disk";
    };
  } ''
    mkfs -t ext3 -b 4096 /dev/${vmTools.hd}
    mount "/dev/${vm.hd}" disk
  ''
)

@SomeoneSerge
Copy link
Contributor Author

I see now. It appears that createEmptyImage never uses size at eval time, so we could rewrite it to relax the constraint

${qemu}/bin/qemu-img create -f qcow2 $diskImage "${toString size}M"

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Jun 20, 2022

I see now. It appears that createEmptyImage never uses size at eval time, so we could rewrite it to relax the constraint

${qemu}/bin/qemu-img create -f qcow2 $diskImage "${toString size}M"

Great! dockerTools would also benefits from that.

@dmadisetti
Copy link
Contributor

dmadisetti commented Oct 14, 2022

For consistency with dockerTools.buildImage it would also be nice to change contents -> copyToRoot.

@SomeoneSerge any other HPC pain points?

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Oct 15, 2022

I don't mind changing the interface of singularity-tools. (That would be a breaking change.) Sorry for not noticing the change of dockerTool.buildImage.

There's another change lineing up that builds the image through a Singularity definition (Apptainer recipe) file to make the image more declarative and the build process explainable. It could be a drop-in replacement of the current Singularity-sandbox-based implementation.

I also went on and made a generator function that turns a settings-like Nix attrset into a definition string. The parser, which does the reverse) is still work in progress.

@github-project-automation github-project-automation bot moved this to 🆕 New in CUDA Team Mar 8, 2023
@SomeoneSerge SomeoneSerge moved this from 🆕 New to 📋 Backlog in CUDA Team Apr 1, 2023
@SomeoneSerge
Copy link
Contributor Author

I'm sorry for the long absence, my priorities had shifted somewhat

@dmadisetti On the high-level I've exactly one pain-point, and that is an unsolved (underinvested) use-case:

  • I want to use singularity to bind-mount /nix/store on a cluster that doesn't support user namespaces nor overlayfs, but has a setuid singularity binary
  • I want to ship a pre-built Nix in a singularity image
  • I want to be able to build that image using Nix, e.g. via singularity-tools.buildImage

I think I might give this a shot again. The issues I had were:

  • As I said, cluster's singularity installation doesn't come with --overlay enabled, so I have to use --bind
  • Using --bind /tmp/blah:/nix/store hides the container's /nix/store -> singularity run fails unable to locate the sym-linked sh and such.
  • Because singularity-tools.buildImage doesn't give user full control over contents, I cannot easily replace the whole thing with static coreutils and a static Nix

Shouldn't be hard to alleviate

@SomeoneSerge
Copy link
Contributor Author

This now suggests another point, that we maybe want a buildImage that is extendable and overridable, including the possibility to override the default contents. One direction could be makeOverride, and I think there's a similar effort being undertaken for dockerTools: #208944

Another possibility is the module system with support for mkMerge/mkForce etc, similar to NixOS. This could also be a viable approach to re-implement the upstream's "definition" files in pure Nix, so as to achieve a declarative interface to buildImage.

@ShamrockLee, settings approach sounds great, I think this should feel very native in Nixpkgs. Has your work gone into any PRs yet?

@ShamrockLee
Copy link
Contributor

@ShamrockLee, settings approach sounds great, I think this should feel very native in Nixpkgs. Has your work gone into any PRs yet?

Not yet, but I already have the implementation integrated the change into my HEP analysis workflow.

It's time to also re-think about the buildImage interface IMO.

@SomeoneSerge

This comment was marked as off-topic.

@dmadisetti
Copy link
Contributor

dmadisetti commented Apr 3, 2023

Hopefully not adding to the noise. My current workflow is making a docker tar with nix, unpacking it, and turning it singularity. A bit of a hack, but it works?

        packages.docker = pkgs.dockerTools.buildNixShellImage {                                                                                                                                                                   
          name = "pre-sif-container";                                                                                                                                                                                                 
          tag = "latest";                                                                                                                                                                                                         
          drv = devShells.default;                                                                                                                                                                                                
        }; 
       packages.singularity = pkgs.stdenv.mkDerivation {                                                                                                                                                                                       
        name = "container.sif";  
        src = .;
        installPhase = '' 
                mkdir unpack
                tar xzvf ${packages.docker}/image.tgz -C unpack
                # Singularity can't handle .gz
                tar -C unpack/ -cvf layer.tar .
               # TODO: Allow for module of user defined nightly, opposed to using src
                singularity build $out Singularity.nightly
        '';                                                                                                                                                                                                                       
      };   

Singularity.nightly containing

Bootstrap:docker-archive
From:layer.tar
....

Big fan of using the Singularity file to define hooks etc..

@ShamrockLee

This comment was marked as off-topic.

@SomeoneSerge

This comment was marked as off-topic.

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Apr 3, 2023

By the way, I was meaning to ask, why do we have to runInLinuxVM? I remember seeing @jbedo mention this allowed setting setuid flags, but I'm not sure where do we need them. I presume QEMU takes its performance toll

It's time to also re-think about the buildImage interface IMO @ShamrockLee

Oh, I'll just throw some bait in. Have you noticed https://discourse.nixos.org/t/working-group-member-search-module-system-for-packages/26574/8 and https://github.com/DavHau/drv-parts in particular?

My current workflow is making a docker tar with nix @dmadisetti

I guess your post further proves there's a use-case:)

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Apr 3, 2023

By the way, I was meaning to ask, why do we have to runInLinuxVM?

It was not until last year that the unprivileged image-building workflow started to be implemented in the Apptainer project. The program used to assert UID == 0 when building the image.

We are closed to the unprivileged image generation with Apptainer. The remaining obstacle is its use of /var/apptainer/mnt/session as the container mount point.

See apptainer/apptainer#215

Sylabs's Singularity fork seems to have caught up some progress in unprivileged image build, but it still expects a bunch of top-level directories /var/singularity/mnt/{container,final,overlay,session,source}, IIRC.

@SomeoneSerge
Copy link
Contributor Author

It was not until last year that the unprivileged image-building workflow started to be implemented in the Apptainer project. The program used to assert UID == 0 when building the image.

I see. So, in principle, we could have run everything except ${projectName} build $out ./img outside QEMU?

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Apr 3, 2023

I see. So, in principle, we could have run everything except ${projectName} build $out ./img outside QEMU?

It's true when it comes to the definition-based build. It won't help much, since it should be trivial in terms of resources to generate the definition file from the definition attrset.

As for the current, Apptainer-sandbox-based buildImage, I'm not sure if we could run the ushare ... lines for runAsRoot outside QEMU. (Update: Currently, runAsRootScript uses the mount --rbind-ed /nix/store, and it simply cannot just run without some kind of emulation.)

@SomeoneSerge
Copy link
Contributor Author

It won't help much

I was rather wondering if we could prepare the file-tree outside qemu and somehow pack the whole batch into an ext3/squashfs image without the mount. But then again, I didn't measure, maybe that too is insignificant

@posch
Copy link
Contributor

posch commented Apr 4, 2023

It won't help much

I was rather wondering if we could prepare the file-tree outside qemu and somehow pack the whole batch into an ext3/squashfs image without the mount. But then again, I didn't measure, maybe that too is insignificant

I also prefer an approach that doesn't involve creating and running virtual machines. singularity/apptainer can run filesytems in squashfs, and I use this script to create containers:

{ pkgs
, contents
, runscript ? "#!/bin/sh\nexec ${pkgs.hello}/bin/hello"
, startscript ? "#!/bin/sh\nexec ${pkgs.hello}/bin/hello"
}:
pkgs.runCommand "make-container" {} ''
  set -o pipefail
  closureInfo=${pkgs.closureInfo { rootPaths=contents ++ [pkgs.bashInteractive]; }}
  mkdir -p $out/r/{bin,etc,dev,proc,sys,usr,.singularity.d/{actions,env,libs}}
  cd $out/r
  cp -na --parents $(cat $closureInfo/store-paths) .
  touch etc/{passwd,group}
  ln -s /bin usr/
  ln -s ${pkgs.bashInteractive}/bin/bash bin/sh
  for p in ${pkgs.lib.concatStringsSep " " contents}; do
    ln -sn $p/bin/* bin/ || true
  done
  echo "${runscript}" >.singularity.d/runscript
  echo "${startscript}" >.singularity.d/startscript
  chmod +x .singularity.d/{runscript,startscript}
  cd $out
  ${pkgs.squashfsTools}/bin/mksquashfs r container.sqfs -no-hardlinks -all-root
  ''

@ShamrockLee
Copy link
Contributor

FYI: With apptainer/apptainer#1284, Apptainer images can be built as a derivation without a VM.

The code already works (tested with singularity-tools.buildImageFromDef from #224636 specifying buildImageFlags = [ "--resolv ${pkgs.emptyFile}" "--hosts ${pkgs.emptyFile}" ];).

The upstream maintainer expects something more general (such as --no-mount), so the current change is not likely to get accepted. Nevertheless, it proves that fully-unprivileged Apptainer image build is possible.

@jbedo
Copy link
Contributor

jbedo commented Apr 19, 2023

I'm sorry for the long absence, my priorities had shifted somewhat

@dmadisetti On the high-level I've exactly one pain-point, and that is an unsolved (underinvested) use-case:

  • I want to use singularity to bind-mount /nix/store on a cluster that doesn't support user namespaces nor overlayfs, but has a setuid singularity binary
  • I want to ship a pre-built Nix in a singularity image
  • I want to be able to build that image using Nix, e.g. via singularity-tools.buildImage

I think I might give this a shot again. The issues I had were:

  • As I said, cluster's singularity installation doesn't come with --overlay enabled, so I have to use --bind
  • Using --bind /tmp/blah:/nix/store hides the container's /nix/store -> singularity run fails unable to locate the sym-linked sh and such.
  • Because singularity-tools.buildImage doesn't give user full control over contents, I cannot easily replace the whole thing with static coreutils and a static Nix

Shouldn't be hard to alleviate

It's a bit hacky but I think this achieves your goals:

singularity-tools.buildImage {name = "minimal-nix"; runAsRoot = "${rsync}/bin/rsync -a ${pkgsStatic.nix}/ ./";}

@ShamrockLee

This comment was marked as off-topic.

@PhDyellow
Copy link

I managed to get a CUDA-capable container built by adjusting memSize along with diskSize.

Running it with env vars isn't solved yet.

@tomodachi94 tomodachi94 added 0.kind: enhancement Add something new 2. status: backlog This is a low priority labels May 13, 2024
@posch
Copy link
Contributor

posch commented Aug 1, 2024

apptainer has merged a PR that allows to use apptainer to build containers in the Nix sandbox:
apptainer/apptainer#2394

With that change, it's possible to build containers with

$ nix-build

default.nix:

{ pkgs ? import <nixpkgs> {} }:

pkgs.callPackage ./make-container.nix {
  inherit pkgs;
  contents = with pkgs; [
    busybox
    nginx
  ];
}

make-apptainer.nix

{ apptainer ? pkgs.apptainer, contents, pkgs }:
pkgs.runCommand "make-container" {} ''
  closureInfo=${pkgs.closureInfo { rootPaths = contents ++ [ pkgs.bashInteractive ]; }}
  set -x
  mkdir -p $out/r/{bin,etc,dev,proc,sys,usr,var/log}
  cd $out/r
  cp -na --parents $(cat $closureInfo/store-paths) .
  touch etc/{passwd,group,resolv.conf}
  ln -s /bin usr/
  ln -s ${pkgs.bashInteractive}/bin/bash bin/sh
  for p in ${pkgs.lib.concatStringsSep " " contents}; do
    ln -sn $p/bin/* bin/ || true
  done
  touch $out/apptainer.conf $out/resolv.conf
  export HOME=$out
  find . -ls
  ${apptainer}/bin/apptainer --config $out/apptainer.conf --debug --verbose build -B $out/resolv.conf:/etc/resolv.conf --disable-cache --fakeroot $out/container.sif $out/r
''

This copies the closure of $contents to $out/r, links all bin/* to /bin/, creates dummy apptainer.conf and resolv.conf files, and finally runs apptainer build.

@ShamrockLee
Copy link
Contributor

Let's land #268199 by splitting it into smaller PRs. We could then add the unprivileged Apptainer image build flow as one of its reusable components.

Here's the first one: #332168

@ShamrockLee
Copy link
Contributor

Let's land #268199 by splitting it into smaller PRs. We could then add the unprivileged Apptainer image build flow as one of its reusable components.

#332437 is the second one, containing various fixes and a few deprecations.

@SomeoneSerge
Copy link
Contributor Author

SomeoneSerge commented Aug 10, 2024

By the way, maybe we should consider dropping support for choosing between apptainer and singularity for building images. For one thing, I suspect we'll have to introduce a separate attribute (like _apptainer-derandomized or _siftool-derandomized; #279250) for a tool patched to leave out all the UUIDs and the timestamps, and it's probably not worth it to maintain patches for both forks...

@pbsds
Copy link
Member

pbsds commented Aug 10, 2024

If the images by one can be ran by the other and are expected to do so going forward then i don't see a problem with that.

@ShamrockLee
Copy link
Contributor

For one thing, I suspect we'll have to introduce a separate attribute (like _apptainer-derandomized or _siftool-derandomized; #279250) for a tool patched to leave out all the UUIDs and the timestamps, and it's probably not worth it to maintain patches for both forks...

How does patching Apptainer and SingularityCE (the apptainer and singularity part) make it difficult to choose between Apptainer and SingularityCE for building images (the singularity-tools part)? We could define apptainer and singularity separately if their build flow differs too much while maintaining only one singularity-tools for the command-line interface they share in common.

@SomeoneSerge
Copy link
Contributor Author

 How does patching Apptainer and SingularityCE (the apptainer and singularity part) make it difficult to choose between Apptainer and SingularityCE for building images (the singularity-tools part)? 

It doesn't, it's just that why would we patch them both separately, if we only really need the patches for singularity-tools, not for the user-facing singularity?

If the images by one can be ran by the other and are expected to do so going forward then i don't see a problem with that.

We could even package siftool separately, and that could be enough...

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Aug 11, 2024

The development would be a lot easier if the reproducible image build functionality could be implemented upstream.

We could even package siftool separately, and that could be enough...

I seems to lose track of this. What is siftool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: enhancement Add something new 2. status: backlog This is a low priority
Projects
Status: 📋 The forgotten
Development

No branches or pull requests

8 participants