Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building singularity container very slow #171551

Closed
cfhammill opened this issue May 4, 2022 · 5 comments
Closed

Building singularity container very slow #171551

cfhammill opened this issue May 4, 2022 · 5 comments
Labels
0.kind: bug Something is broken

Comments

@cfhammill
Copy link
Contributor

Describe the bug

I'm trying to generate a large singularity container (~10G of contents) and it's taking >1hr to generate the image. I have both my nix store and tmpdir set to fast ssd. Monitoring the progress is a challenge because after the Writing superblocks and filesystem accounting information: done message from qemu there is no output while the vm hard disk is set up.

The command that's taking forever is

mkfs -t ext3 -b 4096 /dev/${vmTools.hd}
, it takes over an hour and produces no output.

mkfs -t ext3 -b 4096 /dev/${vmTools.hd}

Monitoring progress is very challenging, because from the view of the host system, the qcow2 file mounted there is deleted. I was able to crudely get progress my finding the pid and fd with

lsof 2>/dev/null | grep qcow

the qcow file will be there and listed as (deleted), the size can then be checked with

(cp /proc/<pid>/fd/<fd> test.qc; du -sh test.qc; rm test.qc)

doing that shows the file growing by tens or hundreds of megs a minute.

I tried tweaking the mkfs command by disabling trim with -o nodiscard, switching to ext4, playing with qemu arguments all with no success.

Steps To Reproduce

I'm building the container using nix build with a flake

{ pkgs, contents }:

with pkgs;
with pkgs.singularity-tools;

buildImage {
  name = "test-container";
  runScript = "#!${stdenv.shell}\nexec /bin/sh $@";
  runAsRoot = ''
     #!${stdenv.shell}
     ${dockerTools.shadowSetup}
  '';

  inherit contents;
  diskSize = 1024*20;
  memSize = 1024*8;
}

this builds the container using a VM with a 20G hard drive.

Expected behavior

I would expect mkfs, especially with -o nodiscard to run in several minutes on my ssd, instead it runs for hours.

Notify maintainers

@jbedo

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

nix-shell -p nix-info --run "nix-info -m"
this path will be fetched (0.00 MiB download, 0.00 MiB unpacked):
  /nix/store/q124axggd6wl6fh9508sxgkknkifrl0z-nix-info
copying path '/nix/store/q124axggd6wl6fh9508sxgkknkifrl0z-nix-info' from 'https://cache.nixos.org'...
 - system: `"x86_64-linux"`
 - host os: `Linux 4.15.0-169-generic, Ubuntu, 18.04.6 LTS (Bionic Beaver)`
 - multi-user?: `no`
 - sandbox: `no`
 - version: `nix-env (Nix) 2.6.1`
 - channels(chris): `"nixpkgs"`
 - nixpkgs: `/home/chris/.nix-defexpr/channels/nixpkgs`
@cfhammill cfhammill added the 0.kind: bug Something is broken label May 4, 2022
@jbedo
Copy link
Contributor

jbedo commented May 5, 2022

Interesting, the following is running quickly for me:

with import (builtins.getFlake "github:nixos/nixpkgs") {};
with pkgs.singularity-tools;

buildImage {
  name = "test-container";
  runScript = "#!${stdenv.shell}\nexec /bin/sh $@";
  runAsRoot = ''
     #!${stdenv.shell}
     ${dockerTools.shadowSetup}
  '';
  diskSize = 1024*20;
  memSize = 1024*8;
  contents = [];
}
sh-5.1$ time nix build -f test.nix

real	0m11.287s
user	0m0.646s
sys	0m0.191s

I'll try an ubuntu machine and see what happens.

@cfhammill
Copy link
Contributor Author

That test ran very quickly for me as well, so I must have been wrong about what's eating the time. It must be the copy loop

mkdir -p bin ./${builtins.storeDir}

I'm a bit surprised, I guess the qcow2 file doesn't preallocate the space and grows as the data gets copied in (hence why I was seeing the size of the qcow2 file grow over time). I checked the size of the references in the <hash>-runtime-deps file produced by writeReferencesToFile and it's 25G, still doesn't account for the many hour runtime I don't think.

@cfhammill
Copy link
Contributor Author

Update on some further experiments, I enabled extended l2 in the qcow file, still no faster. got the idea from https://www.youtube.com/watch?v=zJetcfDVFNw. Didn't help.

I looked at the file /nix/store/yizdlakp49n60f62kri1lk70rjgy95ns-qemu-host-cpu-only-6.2.0/bin/qemu-kvm and I noticed that that the wrapper doesn't seem to actually enable kvm (possibly because I wasn't in the kvm group when I first tried building singularity containers). I added -enable-kvm to the QEMU_OPTS of runInLinuxVM still no change.

chris@TURING:~/documents/tbi_clinical_prediction_project/code/container$ TMPDIR=/mnt/4TB-SSD/chris/.nix-tmp/ nix --cores 4 build --print-build-logs
path '/home/chris/documents/tbi_clinical_prediction_project/code/container' does not contain a 'flake.nix', searching up
warning: Git tree '/home/chris/documents/tbi_clinical_prediction_project' is dirty
singularity-image-test-container.img> Formatting '/nix/store/8a0r3fpkx0h9bbf5hsncvmcqjxahpjp0-singularity-image-test-container.img/disk-image.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=on compression_type=zlib size=42949672960 lazy_refcounts=off refcount_bits=16
singularity-image-test-container.img> cSeaBIOS (version rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org)
singularity-image-test-container.img> iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+BFF91140+BFEF1140 CA00
singularity-image-test-container.img> Booting from ROM...
singularity-image-test-container.img> Probing EDD (edd=off to disable)... ocloading kernel modules...
singularity-image-test-container.img> [    0.259248] Module has invalid ELF structures
singularity-image-test-container.img> [    0.274746] Module has invalid ELF structures
singularity-image-test-container.img> [    0.277254] Module has invalid ELF structures
singularity-image-test-container.img> [    0.278837] Module has invalid ELF structures
singularity-image-test-container.img> [    0.326114] Module has invalid ELF structures
singularity-image-test-container.img> [    0.327768] Module has invalid ELF structures
singularity-image-test-container.img> [    0.332860] Module has invalid ELF structures
singularity-image-test-container.img> [    0.334861] Module has invalid ELF structures
singularity-image-test-container.img> [    0.336670] Module has invalid ELF structures
singularity-image-test-container.img> [    0.338596] Module has invalid ELF structures
singularity-image-test-container.img> [    0.346266] Module has invalid ELF structures
singularity-image-test-container.img> [    0.348049] Module has invalid ELF structures
singularity-image-test-container.img> [    0.349430] Module has invalid ELF structures
singularity-image-test-container.img> [    0.351276] Module has invalid ELF structures
singularity-image-test-container.img> [    0.352959] Module has invalid ELF structures
singularity-image-test-container.img> [    0.387931] Module has invalid ELF structures
singularity-image-test-container.img> [    0.393880] Module has invalid ELF structures
singularity-image-test-container.img> [    0.399981] Module has invalid ELF structures
singularity-image-test-container.img> [    0.405045] Module has invalid ELF structures
singularity-image-test-container.img> [    0.408280] Module has invalid ELF structures
singularity-image-test-container.img> mounting Nix store...
singularity-image-test-container.img> mounting host's temporary directory...
singularity-image-test-container.img> starting stage 2 (/nix/store/6ziw0dr9gwcs5svd7smkm2i0c9hg69jh-vm-run-stage2)
singularity-image-test-container.img> mke2fs 1.46.5 (30-Dec-2021)
singularity-image-test-container.img> Discarding device blocks: done
singularity-image-test-container.img> Creating filesystem with 10485760 4k blocks and 2621440 inodes
singularity-image-test-container.img> Filesystem UUID: 28a69c82-1fa4-4d32-a8b9-80955184a2e7
singularity-image-test-container.img> Superblock backups stored on blocks:
singularity-image-test-container.img>   32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
singularity-image-test-container.img>   4096000, 7962624
singularity-image-test-container.img> Allocating group tables: done
singularity-image-test-container.img> Writing inode tables: done
singularity-image-test-container.img> Creating journal (65536 blocks): done
singularity-image-test-container.img> Writing superblocks and filesystem accounting information: done

I'm wondering now if the module has invalid ELF structure could be a problem now.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Nov 12, 2022
@ShamrockLee
Copy link
Contributor

FYI, I'm refactoring the singularity-tools. #224636
Suggestions and feedback will be appreciated.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 5, 2023
@cfhammill
Copy link
Contributor Author

this issue no longer applies, thanks to https://determinate.systems/posts/qemu-fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

3 participants