Skip to content

Latest commit

 

History

History
171 lines (110 loc) · 3.78 KB

PITCHME.md

File metadata and controls

171 lines (110 loc) · 3.78 KB

platforms for reproducibility

overview of available options


what tools are available

  • for ad hoc workflows and tools: docker and singularity
  • for lightweight application: jupyter notebooks
  • for pipelines: shell scripts in github
  • for software tools: bioconda/biocontainer

devops

  • software development + software operations
  • automate and monitor

Devops Explained


Containers VMs https://www.zdnet.com/article/what-is-docker-and-why-is-it-so-darn-popular/


virtualisation

pros and cons

  • ++ very similar to a full OS
  • ++ high OS diversity
  • -- need of more space and resources
  • -- slower than containers
  • -- not as good automation

containers

pros and cons

  • ++ faster
  • ++ no need for full OS
  • ++ easy solutions for distribution of recipes. high portability
  • ++ easy to automate
  • -- still OS dependant solutions
  • -- not real OS in some cases

Docker

Docker


Docker

  • platform for developing, shipping, and running applications
  • infrastructure as application/code
  • Open Container Initiative
  • Docker community edition

Docker components


Docker image

  • read-only templates
  • containers are run from them
  • images are not run
  • images have several layers

Docker image - building

  • can be built from existing images
    • ubuntu, alpine
  • any modification from base image is a new layer ( tip: use && )
  • base images can be created with tools such as Debootstrap
  • images have several layers

Docker image - instructions

  • Recipe: Dockerfile
  • Instructions
  • FROM
  • ADD, COPY
  • RUN
  • ENV PATH, ARG
  • USER, WORKDIR, LABEL
  • VOLUME, EXPOSE
  • CMD, (ENTRYPOINT)

Reference


** One tool, one container **

  • start from packages e.g. pip/PyPI, CPAN, or CRAN
  • use versions for tools and containers
  • use ENV PATH instead of ENTRYPOINT
  • reduce size as much as possible
  • keep data outside the container
  • check the license
  • make your container discoverable e.g. biocontainers, quay.io, docker hub

Example

FROM biocontainers/biocontainers:v1.0.0_cv4

LABEL base_image=“biocontainers:v1.0.0_cv4”

LABEL version=“3”

LABEL software=“Comet”

LABEL software.version=“2016012”

LABEL about.summary=“an open source tandem mass spectrometry sequence database search tool”

LABEL about.home=http://comet-ms.sourceforge.net

LABEL about.documentation=http://comet-ms.sourceforge.net/parameters/parameters_2016010

LABEL about.license_file=http://comet-ms.sourceforge.net

LABEL about.license=“SPDX:Apache-2.0”

LABEL extra.identifiers.biotools=“comet”

LABEL about.tags=“Proteomics”

LABEL maintainer=“Felipe da Veiga Leprevost <felipe@leprevost.com.br>”

USER biodocker

RUN ZIP=comet_binaries_2016012.zip && wget https://github.com/BioDocker/software-archive/releases/download/Comet/$ZIP-O/tmp/$ZIP&&unzip/tmp/$ZIP-d/home/biodocker/bin/Comet/&&chmod-R 755/home/biodocker/bin/Comet/*&&rm/tmp/$ZIP

RUN mv/home/biodocker/bin/Comet/comet_binaries_2016012/comet.2016012.linux.exe/home/biodocker/bin/Comet/comet

ENV PATH /home/biodocker/bin/Comet:$PATH

WORKDIR /data/

Further reading

  • impact of docker containers on performance
  • container-based virtualization for HPC environments
  • article recommendations containers
  • article Grüning on virtualization

Thanks