Dibs makes it simple to turn code into Docker images.
To some extent, it can be seen as an alternative to
using a Dockerfile,
with the difference that dibs
provides finer control over the
different phases and makes it easier to land on a trimmed image.
First things first: why dibs
? The main need is to pack a Docker
container image starting from some code, much like what can be done with
a Dockerfile… let’s
look at a few problems and how dibs
addresses them.
Just putting the code inside a container is not sufficient, because…
-
… it probably needs a runtime environment (Perl, Java, Python, …) inside the image too
-
… it might need additional pre-requisites in form of other software or libraries, often provided by the underlying Linux distribution in the form of packages
-
… it might need to have some parts compiled or undergo a build process
-
… it will probably need the container to be set so that the invocation of the program is eased.
The build process usually requires tools (like a C compiler, development
versions of libraries, etc.) that are rarely needed during the runtime phase.
To cope with this, release 17.05 of Docker introduced a new feature where a
Dockerfile can include
multiple
stages[1]
(each marked with an initial FROM
section), where it’s possible to perform
e.g. compilation within a specific track and later use those artifacts inside
another container started from a leaner image.
dibs
addresses this issue with the idea that the path from the code to the
Docker image is not necessarily linear and might be walked through different
lanes, or phases:
-
each phase runs a sequence of operations that stack upon a container image much like a Dockerfile does;
-
different phases share artifacts around through shared directories; only the
-
containers of interest are saved as images, others are thrown away.
As an example, a common patter is to divide the composition of an image into two phases, one where artifacts are compiled (through compilation and build tools) and one where these artifacts are installed along with runtime components. For this reason, two base images can be setup.[2] and used in two different phases, one for building and one for bundling the final artifact.
_____________ +--------------+ / \ | | - start from "build"-ready image, with | Shared dirs |----| Build Phase | compiler & other build tools inside | ----------- | | | - compile in container | | +--------------+ - save in cache | src - | | cache - | +--------------+ | env - | | | - start from "runtime"-ready image | ... - |----| Bundle Phase | - copy artifacts from cache \_____________/ | | to destination +--------------+ - set up entry point
In this way, build operations can be performed from a heavier container image (the one with the build tools), while the bundle operation relies on a leaner starting container image (the one with just the runtime), providing a leaner result.
The difference with respect to plain Dockerfiles is in how the whole process
is configured: in dibs
it is much easier to reuse both operations performed
inside a container (the next section is about this), as well as fragments of
the configuration itself to avoid repetitions (which, much like code, can lead
to bugs).
Another common problem with Dockerfiles has to do with how container images are customized, e.g. to execute build/bundle operations.
The RUN directive in the Dockerfile is very powerful, but it allows executing either direct commands or some script/program that are available inside the container during its construction. While this is strictly all that is needed, it’s quite difficult to reuse operations and not repeat them.
dibs
allows defining packs of operations[3] in many different ways, which include grouping them
in a git repository that can be shared across multiple projects. An example of
such a repository is
dibspack-basic.
The basic metaphor used in Dibs is inspired to drawing (because the aim is to generate… an image). To generate an image, multiple actions are available:
-
setting the starting point for an image is to prepare it, much like the FROM directive in the Dockerfile;
-
a single operation step performed inside a container is a stroke. It can be thought as a single 'RUN` RUN directive in the Dockerfile;
-
finalizing an image is to frame it (there is no equivalent operation in the Dockerfile);
-
collecting multiple actions together is to create a sketch (it can contain other sketches itself).
When run, dibs
goes through two main phases, namely the setup and the
execution of actions:
-
in the setup phase collects all available information for the specific run. It merges together different sources (command line parameters, environment variables, and the configuration file that will be described below), making also sure that the source code is fetched and made ready for operating the different steps;
-
in the execution phases,
dibs
executes the required actions. This may end up in an image or not, depending on the needs.
The operations in a stroke are performed through dibspacks (or simply packs). They are the specification of where something can be found, most probably a program or a group of programs that can be executed inside the container.[4]
All actions (and something more) are defined inside a configuration file,[5], whose structure will be described below.
Dibs elects a directory as a base camp for its operations, called the project
directory. Depending on the specific configuration, it defaults to a dibs
sub-directory of the current directory, or to the current directory itself.
The configuration file and other directories shared across all stroke
executions are related to the project directory, and are:
-
src
: where the source (code, prerequisites, etc.) is made available, read-write; -
cache
: a convenience area where build artifacts can be stored, e.g. to pass them across different strokes or entire phases. Available in read-write mode inside the container -
env
andenvile
: read-only directories where data is passed from the outside. The former can be used to set up data from the user, the latter is used bydibs
itself to set them in the form of keys (filename) associated to data (the contents of the files); -
pack
,auto/open
: where dibpacks are stored (the former local to the specific project, the latter generated automatically bydibs
from remote/dynamic dibspacks).
It’s better to start looking at a couple of examples to better understand how
dibs
works.
The basic mode of operations of dibs
is development mode. As the name
implies, it is best used when developing the software and generating the
container image during development itself (e.g. as a developer).
The example assumes the following layout of files and directories:
.git/ [...] app.pl cpanfile dibs.yml prereqs/ alpine.build alpine.bundle
where:
-
.git
indicates that the whole project is tracked withŋit
; -
app.pl
is a Perl program; -
cpanfile
details the module dependencies of the Perl program; -
dibs.yml
is `dibs’s configuration file; -
prereqs
is a directory for storing pre-requirements files -
alpine.build
andalpine.bundle
are two programs that, when executed inside a container, make sure to install the OS packages needed byapp.pl
or any of the modules that will be installed bycpanfile
. Each program installs the requirements for a specific phase, in this casebuild
andbundle
represent the build phase (where artifacts are generated) and the bundle phase (where the artifacts are put in place along with the runtime environment).
The dibs.yml
configuration file in this example is the following (note: this
is quite simple at this stage, additional features will be shown later):
name: exadev # # (1)
packs: # # (2)
basic:
type: git
origin: https://github.com/polettix/dibspack-basic.git
actions:
default: [build, bundle] # # (3)
prereqs: # # (4)
pack: basic
path: prereqs
build: # # (5)
envile: # # (6)
DIBS_PREREQS: build
actions:
- from: 'alpine:3.6' # # (7)
- prereqs # # (8)
- name: compile # # (9)
pack: basic
path: perl/build
- name: save compiled artifacts in cache
pack:
run: | # # (10)
#!/bin/sh
src_dir="$(cat DIBS_SRC_DIR)"
cache_dir="$(cat DIBS_CACHE_DIR)"
dst_dir="$cache_dir/app"
set -e
rm -rf "$target"
mkdir -p "$target"
cp -a "$src_dir/app.pl" "$target"
cp -a "$src_dir/local" "$target"
bundle:
envile:
DIBS_PREREQS: bundle
actions:
- from: 'alpine:3.6'
- prereqs
- name: put artifacts in place
pack:
run: |
#!/bin/sh
cache_dir="$(cat DIBS_CACHE_DIR)"
src_dir="$cache_dir/app"
dst_dir="/app"
rm -rf "$dst_dir"
cp -a "$src_dir" "$dst_dir"
commit: # # (11)
entrypoint: []
cmd: ['/bin/sh', '-l']
- name: save bundled image # # (12)
image_name: exadev
tags: ['latest', '0.3']
-
the name is used for temporary images
-
it’s possible to define named packs and refer to them later
-
an action named
default
is what is executed… by default -
this is the specification of a stroke, based on the
basic
pack. -
this is the specification of a sketch (because it contains a list of actions)
-
enviles are similar to environment variables, but less invasive
-
this is equivalent to FROM in a Dockerfile
-
this "calls" the `prereqs' stroke defined elsewhere (above in this case)
-
this is a stroke where a name is assigned explicitly, so that it will be shown when executed
-
this is an immediate pack that is saved as a script and then executed inside the container
-
adding a
commit
sets additional traits of the image layer, e.g.entrypoint
,cmd
,user
, … -
this is a frame, i.e. the actual saving of an image
Running dibs
in this case is as simple as going in the root directory of the
code and run:
$ dibs
This will execute the default
sketch, which is comprised of two actions
build
and bundle
. They will be executed both, in the specific order. They
are both sketches themselves (they both contain a list of actions).
Sketch build
starts from a basic image (an Alpine Linux, release 3.6) and
executes three RUN
-like actions on top of it, in the specific order:
-
installation of pre-requisites (calling the
prereqs
stroke defined above). The script that install pre-requisites uses the variableDIBS_PREREQS
to select the right prerequisites script, which will beprereqs/alpine.build
in this case. -
"compilation" of the Perl code. This reduces to the installation of modules as specified in file
cpanfile
-
save of
app.pl
(main program) andlocal
(where installed modules are placed) inside the cache directory (in particular, in theapp
sub-directory)
Each step is executed "on top" of the previous one, just like several RUN
directives in a Dockerfile are executed.
Sketch build
does not include a frame action, so the final container is
removed and not saved.
Sketch bundle
is similar to build
, but also different:
-
starts from the same base image
alpine:3.6
-
install pre-requisites. In this case
DIBS_PREREQS
is set tobundle
, so the prerequisites program that will be run isprereqs/alpine.bundle
. This is an example of reuse, because the same script (prereqs
in thebasic
pack) is used to obtain different results in different conditions; -
artifacts are copied from the cache to the final target destination (in
/app
). This is the last "layer" that is added to the image, so there is also the specification of acommit
section to set theentrypoint
and thecmd
to be executed by default. -
the last action of the sketch is a frame that saves the final container as an image with two tags:
exadev:latest
andexadev:0.3
.
The previous example showed an example where build and bundle are separated, but as a matter of fact it does not provide a real advantage in terms of execution time, because the installation of prerequisites on top of a basic image is always performed.
From this point of view, dibs
performs worse than plain
Dockerfiles[6] because it does not
come with implicit caching/pinpointing of intermediate containers. This is
meant as a feature though, because the implicit pinpointing and reuse of
previously built layers can bite when things change around and docker
is not
aware of it[7]; this is a likely scenario in dibs
because there is much more
space for using remote stuff.
It’s possible to expand the example to limit the amount of repeated work, like shown in the following example.
name: exadev
packs:
basic:
type: git
origin: https://github.com/polettix/dibspack-basic.git
actions:
default: [build, bundle]
prereqs:
pack: basic
path: prereqs
builder: # # (1)
envile:
DIBS_PREREQS: build
actions:
- from: 'alpine:3.6'
- prereqs
- name: save builder base image
image_name: builder
tags: '1.0'
build:
actions:
- from: 'builder:1.0' # # (2)
- name: compile
pack: basic
path: perl/build
- name: save compiled artifacts in cache
pack:
run: |
#!/bin/sh
src_dir="$(cat DIBS_SRC_DIR)"
cache_dir="$(cat DIBS_CACHE_DIR)"
dst_dir="$cache_dir/app"
set -e
rm -rf "$target"
mkdir -p "$target"
cp -a "$src_dir/app.pl" "$target"
cp -a "$src_dir/local" "$target"
bundler:
envile:
DIBS_PREREQS: bundle
actions:
- from: 'alpine:3.6'
- prereqs
- name: save bundler base image
image_name: bundler
tags: '1.0'
bundle:
actions:
- from: 'bundler:1.0'
- name: put artifacts in place
pack:
run: |
#!/bin/sh
cache_dir="$(cat DIBS_CACHE_DIR)"
src_dir="$cache_dir/app"
dst_dir="/app"
rm -rf "$dst_dir"
cp -a "$src_dir" "$dst_dir"
commit:
entrypoint: []
cmd: ['/bin/sh', '-l']
- name: save bundled image
image_name: exadev
tags: ['latest', '0.3']
-
Former
build
is divided into parts, this is the first and yields an image that is saved permanently asbuilder:1.0
-
The image is then used as a base for the
build
stroke.
In this example, former build
sketch has been broken down into two sketches,
the first one (builder
) installing the pre-requisites and saving a base
image that is suitable for building (builder:1.0
) and is thus used as the
starting point for sketch build
. A similar split has been performed onto
bundle
, extracting the pre-requisites part into bundler
.
To generate the new base images for building and bundling the following command is run:
$ dibs builder bundler # generates builder:1.0 and bundler:1.0
After this step has been run, these images are used as bases for the new
build
and bundle
steps, so when the following command is run:
$ dibs build bundle
the prerequisites installation is not performed any more, saving time.
This trick allows pinpointing specific steps of interest for explicit reuse. Making it explicit also opens the door to easily distribute responsibilities to other teams for the different stages.[8]
The split in the previous example was possible because of the assumption that pre-requisites change very seldom in a project (with the possible exception of the initial days). Anyway, it’s possible that the pre-requisites have to change from time to time, in which case it’s necessary to regenerate the base images to include them, which might be easily overlooked.
At the expense of an additional layer, though, it’s possible to repeat the
prereqs
stroke inside the build
and the bundle
strokes; these will
mostly resolve into nothing (i.e. no change) unless an addition is put in the
prerequisites, in which case the addition will be honored. The following
dibs.yml
implements this approach.
name: exadev
packs:
basic:
type: git
origin: https://github.com/polettix/dibspack-basic.git
actions:
default: [build, bundle]
prereqs:
pack: basic
path: prereqs
builder:
envile:
DIBS_PREREQS: build
actions:
- from: 'alpine:3.6'
- prereqs
- name: save builder base image
image_name: builder
tags: '1.0'
build:
envile: # # (1)
DIBS_PREREQS: build
actions:
- from: 'builder:1.0'
- prereqs # # (2)
- name: compile
pack: basic
path: perl/build
- name: save compiled artifacts in cache
pack:
run: |
#!/bin/sh
src_dir="$(cat DIBS_SRC_DIR)"
cache_dir="$(cat DIBS_CACHE_DIR)"
dst_dir="$cache_dir/app"
set -e
rm -rf "$target"
mkdir -p "$target"
cp -a "$src_dir/app.pl" "$target"
cp -a "$src_dir/local" "$target"
bundler:
envile:
DIBS_PREREQS: bundle
actions:
- from: 'alpine:3.6'
- prereqs
- name: save bundler base image
image_name: bundler
tags: '1.0'
bundle:
envile: # # (1)
DIBS_PREREQS: bundle
actions:
- from: 'bundler:1.0'
- prereqs # # (2)
- name: put artifacts in place
pack:
run: |
#!/bin/sh
cache_dir="$(cat DIBS_CACHE_DIR)"
src_dir="$cache_dir/app"
dst_dir="/app"
rm -rf "$dst_dir"
cp -a "$src_dir" "$dst_dir"
commit:
entrypoint: []
cmd: ['/bin/sh', '-l']
- name: save bundled image
image_name: exadev
tags: ['latest', '0.3']
-
The
prereqs
program relies upon theDIBS_PREREQS
variable, so it has to be set wheneverprereqs
will be used. -
The
prereqs
stroke is re-introduced as the first step in bothbuild
andbundle
. Most of the times this will be a no-op.
Running the prereqs
step can anyway draw time from the build/bundle process
though, so in all cases in which it can be skipped it can be useful to avoid
it. The following example does some refactoring to add buildq
(i.e.
the quick version of build
), leaving out bundleq
(which can undergo a
similar transformation).
name: exadev
packs:
basic:
type: git
origin: https://github.com/polettix/dibspack-basic.git
actions:
default: [build, bundle]
prereqs:
pack: basic
path: prereqs
builder:
envile:
DIBS_PREREQS: build
actions:
- from: 'alpine:3.6'
- prereqs
- name: save builder base image
image_name: builder
tags: '1.0'
build_basics: # # (1)
- name: compile
pack: basic
path: perl/build
- name: save compiled artifacts in cache
pack:
run: |
#!/bin/sh
src_dir="$(cat DIBS_SRC_DIR)"
cache_dir="$(cat DIBS_CACHE_DIR)"
dst_dir="$cache_dir/app"
set -e
rm -rf "$target"
mkdir -p "$target"
cp -a "$src_dir/app.pl" "$target"
cp -a "$src_dir/local" "$target"
build: # # (2)
envile:
DIBS_PREREQS: build
actions:
- from: 'builder:1.0'
- prereqs
- build_basics
buildq: # # (2)
- from: 'builder:1.0'
- build_basics
# ...
-
build_basics
is a new sketch that includes strokes to compile modules and save artifacts in the cache -
the new artifact is used in both the
build
andbuildq
sketches, avoiding repetitions
With this setup:
-
"normal" work on code can rely upon
buildq
and skip theprereqs
stroke (which consumes some time) -
"safe" work can still rely upon
build
to ensure thatprereqs
are honored. This might come handy when a new prerequisite is added and thebuildq
sketch yields an error because of missing dependencies, without the need to regenerate the full base image (e.g. to test out if the addition to the prerequisites is sufficient or needs to be changed) -
in the medium-long term, though, it’s still better to re-generate the base image.
As in code, repetitions can be dangerous in a dibs.yml
file because changes
would have to be applied in multiple places. In the examples above, there are
a few repetitions in the names of images used as base.
YAML allows the definition of anchors and aliases to avoid repetitions inside the file, like in the following example.
name: exadev
variables: # # (1)
- &base_image 'alpine:3.6'
- &base_builder 'builder:1.0'
- &base_bundler 'bundler:1.0'
packs:
basic:
type: git
origin: https://github.com/polettix/dibspack-basic.git
actions:
default: [build, bundle]
prereqs:
pack: basic
path: prereqs
builder:
envile:
DIBS_PREREQS: build
actions:
- from: *base_image # # (2)
- prereqs
- name: save builder base image
tags: *base_builder # # (2)
build_basics:
- name: compile
pack: basic
path: perl/build
- name: save compiled artifacts in cache
pack:
run: |
#!/bin/sh
src_dir="$(cat DIBS_SRC_DIR)"
cache_dir="$(cat DIBS_CACHE_DIR)"
dst_dir="$cache_dir/app"
set -e
rm -rf "$target"
mkdir -p "$target"
cp -a "$src_dir/app.pl" "$target"
cp -a "$src_dir/local" "$target"
build:
envile:
DIBS_PREREQS: build
actions:
- from: *base_builder # # (2)
- prereqs
- build_basics
buildq:
- from: *base_builder
- build_basics
bundler:
envile:
DIBS_PREREQS: bundle
actions:
- from: *base_image # # (2)
- prereqs
- name: save bundler base image
tags: *base_bundler # # (2)
bundle_basics:
- name: put artifacts in place
pack:
run: |
#!/bin/sh
cache_dir="$(cat DIBS_CACHE_DIR)"
src_dir="$cache_dir/app"
dst_dir="/app"
rm -rf "$dst_dir"
cp -a "$src_dir" "$dst_dir"
commit:
entrypoint: []
cmd: ['/bin/sh', '-l']
- name: save bundled image
image_name: exadev
tags: ['latest', '0.3']
bundle:
envile:
DIBS_PREREQS: bundle
actions:
- from: *base_bundler # # (2)
- prereqs
- bundle_basics
bundleq:
- from: *base_bundler # # (2)
- bundle_basics
-
Variables can be defined as anchors in a single place
-
Anchors are then references via aliases in multiple places
It’s possible to place the YAML "variables" more or less everywhere, although
it is suggested to place them under the variables
key.
It is also possible to inherit some characteristics from other actions by
using the extends
key in the definition of an action. In the following
example, the DIBS_PREREQS
envile is defined once (in buildish
for
building, in bundlish
for bundling) and then used where needed.
name: exadev
variables:
- &base_image 'alpine:3.6'
- &base_builder 'builder:1.0'
- &base_bundler 'bundler:1.0'
packs:
basic:
type: git
origin: https://github.com/polettix/dibspack-basic.git
actions:
default: [build, bundle]
prereqs:
pack: basic
path: prereqs
buildish: # # (1)
envile:
DIBS_PREREQS: build
builder:
extends: buildish # # (2)
actions:
- from: *base_image
- prereqs
- name: save builder base image
tags: *base_builder
build_basics:
- name: compile
pack: basic
path: perl/build
- name: save compiled artifacts in cache
pack:
run: |
#!/bin/sh
src_dir="$(cat DIBS_SRC_DIR)"
cache_dir="$(cat DIBS_CACHE_DIR)"
dst_dir="$cache_dir/app"
set -e
rm -rf "$target"
mkdir -p "$target"
cp -a "$src_dir/app.pl" "$target"
cp -a "$src_dir/local" "$target"
build:
extends: buildish # # (2)
actions:
- from: *base_builder
- prereqs
- build_basics
buildq:
- from: *base_builder
- build_basics
bundlish: # # (1)
envile:
DIBS_PREREQS: bundle
bundler:
extends: bundlish # # (2)
actions:
- from: *base_image
- prereqs
- name: save bundler base image
tags: *base_bundler
bundle_basics:
- name: put artifacts in place
pack:
run: |
#!/bin/sh
cache_dir="$(cat DIBS_CACHE_DIR)"
src_dir="$cache_dir/app"
dst_dir="/app"
rm -rf "$dst_dir"
cp -a "$src_dir" "$dst_dir"
commit:
entrypoint: []
cmd: ['/bin/sh', '-l']
- name: save bundled image
image_name: exadev
tags: ['latest', '0.3']
bundle:
extends: bundlish # # (2)
actions:
- from: *base_bundler
- prereqs
- bundle_basics
bundleq:
- from: *base_bundler
- bundle_basics
-
These two definitions are abstract and do not specify a type of action (although only sketches and strokes leverage the
envile
key) -
Using
extends
allows "importing" all definitions from the referred element.
Import of traits from ancestors is somehow crude, because a redefinition in the derived element totally overwrites the ancestor’s data.
The variables
highest-level key is supposed to be associated to an
array-type value. Each item in this array that is a hash with a single key
function
and an array value is subject to expansion. The following is an
example of the join
function (which is also the only one available).
variables: - function: &whatever ['join', ':', 'something', 'latest'] actions: foobar: - from: *whatever # ...
When read by dibs
, the value associated to anchor whatever
is expanded
in-place to something:latest
; the application of the operation in-place also
means that all aliases will get this expanded value (like the from
statement
in the example).
As anticipated, strokes define programs that will be executed inside a
container. It is possible to pass arguments to these programs, in order to
increase their reusability, via the args
key inside a stroke.
Example:
actions: whatever: args: ['first', '2nd', 'third'] pack: run: | #!/bin/sh while [ "$#" -gt 0 ] ; do printf "%s\n" "argument: <$1>" shift done
Arguments in a stroke are subject to expansion in specific conditions, as in the following example:
actions: whatever: args: - 'this is a string' - path_cache: whatever - path_src: lib pack: run: | #!/bin/sh while [ "$#" -gt 0 ] ; do printf "%s\n" "argument: <$1>" shift done
In the example above, the second and third argument are objects with a single
key-value pair. Values associated to keys path_cache
, path_src
, etc. are
expanded as sub-directories of the corresponding zones (cache, src, etc. in
the specific case).
In the initial example, the dibs.yml
file is part of the code itself, but
this need not be. It’s possible to separate concerns of development and
build/bundling using the so-called alien mode.
This mode of operations is somehow similar to a bare git
repository, where
there is no sub-directory but the project directory is directly the current
directory. The layout is as follows:
auto cache dibs.yml env pack src
The src
directory is where the source code is supposed to be placed. It’s
possible to develop code directly there, of course, although it’s probably
better to rely upon the origin
directive (or command-line option) and fetch
it remotely.
$ dibs --alien --origin "$ORIGIN"
This README file is only meant as an introduction to the possibilities. The manual contains all details and is the next suggested reading.
dibs
itself.
--no-cache
, for example.