Should not declare VOLUME for /data/db #306

aparamon · 2018-09-28T07:00:34Z

Currently, Dockerfile declares VOLUME for /data/db, /data/configdb:

Line 87 in 32e5645

VOLUME /data/db /data/configdb

This is sub-optimal, because some workflows in inherited images become excessively complicated.
For example, seeding a database from a dump now requires

RUN mkdir /data/local-db

COPY mongo-dump /var/mongo-dump
RUN mongod --fork --dbpath /data/local-db --logpath /var/log/mongodb.log && \
    mongorestore --db mydb /var/mongo-dump/mydb && \
    mongod --shutdown  --dbpath /data/local-db
RUN chown -R mongodb:mongodb /data/local-db

CMD ["mongod", "--dbpath", "/data/local-db"]

instead of just

RUN mongod --fork --logpath /var/log/mongodb.log && \
    mongorestore --db mydb /var/mongo-dump/mydb && \
    mongod --shutdown

because /data/db doesn't persist between docker build and docker run invocations.

It is proposed to remove VOLUME directive, and leave volumes configuration up to end user.

The text was updated successfully, but these errors were encountered:

yosifkit · 2018-10-03T17:57:49Z

Storing the already initialized database files in the images layers is not a great idea. Because it would be in a copy-on-write filesystem, the moment that you start a new container and MongoDB changes any of its files, it now uses twice as much space.

See also https://docs.docker.com/storage/storagedriver/overlayfs-driver/#modifying-files-or-directories:

Writing to a file for the first time: The first time a container writes to an existing file, that file does not exist in the container (upperdir). The overlay/overlay2 driver performs a copy_up operation to copy the file from the image (lowerdir) to the container (upperdir). The container then writes the changes to the new copy of the file in the container layer.

However, OverlayFS works at the file level rather than the block level. This means that all OverlayFS copy_up operations copy the entire file, even if the file is very large and only a small part of it is being modified. This can have a noticeable impact on container write performance.

And https://docs.docker.com/storage/storagedriver/overlayfs-driver/#performance-best-practices:

Use volumes for write-heavy workloads: Volumes provide the best and most predictable performance for write-heavy workloads. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write.

Have you thought of using /docker-entrypoint-initdb.d/ to have it restore the database on start instead? (hub docs)

aparamon · 2018-10-04T10:39:55Z

Hi @yosifkit, thank you for your detailed explanation!
My original motivation for restoring DB at build time was to optimize the container start-up time. However now it seems that the actual speed-up would only take place in read-only scenario, as in case of write operations Docker would still have to copy the huge file.
What do you believe to be the optimal way of setting-up a pre-seeded Mongo container for read-only usage?

chobolt · 2020-03-02T13:35:28Z

I spent quite some time understanding why my bind volume was not being used by mongo docker image, and eventually discovered the hardcoded VOLUME in the dockerfile.

I'm fine with rebuilding the image without this declaration (which I did), but I find it weird to force the use of volumes when it should be up to the final user to decide how to store data.

mildmojo · 2021-01-21T20:16:39Z

In my case, I want to construct a seeded database image that will be the basis for runtime containers that use a persistent, named volume for /data/db in my development environment. I can't base my seeded image Dockerfile on the official mongo image because of the VOLUME declaration. (It took me a couple of days to figure out why my Dockerfile writes to /data/db were being discarded at build time--yikes!)

If I fork the official mongo Dockerfile and remove the VOLUME instruction, the resulting image works great. I can base my Dockerfile on this -no-volume base image, do RUN seed-my-database.sh at build time, then later invoke docker or use a compose file that mounts a volume at /data/db, and my seed data is copied to the volume when the container's created. Perfect.

Storing the already initialized database files in the images layers is not a great idea. Because it would be in a copy-on-write filesystem, the moment that you start a new container and MongoDB changes any of its files, it now uses twice as much space.

The storage space tradeoff is worth it for me, since it saves me so much time when I need to reset to a known state. My seeded DB is a few hundred megabytes or a gigabyte, which I can easily spare on my dev machine or CI instances.

Have you thought of using /docker-entrypoint-initdb.d/ to have it restore the database on start instead?

In my case, the DB seed process from a known DB dump takes 8-10 minutes to complete on my workstation. I may need to reset the DB to a known state dozens of times while debugging DB migrations or a new feature. That reset takes seconds if I have a seeded image, but would be untenable if I had to restore the DB every time.

I can work around it, but I'd love to see an official -no-volume variant image. I'd expect the extra maintenance effort to be very small, given that the rest of it would be identical.

tianon · 2021-01-21T21:56:17Z

If you add "--dbpath" or set ".storage.dbPath" in a specified "--config" file, that value will be respected.

mildmojo · 2021-01-30T20:53:48Z

If you add "--dbpath" or set ".storage.dbPath" in a specified "--config" file, that value will be respected.

I ended up using this, but it means I have to make sure compose files or docker run commands that use this image duplicate my --dbpath /mongo-data/db argument if they're providing their own command values, which isn't great or obvious. I'd really prefer to store my data in the default location.

tianon · 2021-02-06T00:20:25Z

If you're building an image with the data pre-seeded, you can combat that by setting CMD:

CMD ["--dbpath", "/mongo-data/db"]

(and then it'll be the default for users who don't specify a command)

mildmojo · 2021-02-07T05:01:26Z

If you're building an image with the data pre-seeded, you can combat that by setting CMD:

Yeah, this is the approach aparamon outlined, and it's the strategy I'm currently using.

I agree with aparamon that a seeded image is a legitimate use case, and the official image doesn't work well for seeding at build time because it contains VOLUME /data/db. The workaround for seeded images adds Dockerfile chaff, creates an extra unused anonymous volume at runtime, and it yields images that carry an asterisk: your seed data disappears when your users' containers naïvely provide their own commands (e.g. a compose file with command: --auth or a docker run --rm mongo-seeded --auth).

As a tooling developer, it would help to have an official mongo image without VOLUME so that I could build seeded mongo images that are easy-to-use drop-in replacements for the official (empty) image.

polarathene · 2024-03-12T06:12:06Z

I'm not sure why, but it is a common issue to see for the DB images (equivalent issue for postgres, mysql, redis).

These issues remain open for many years, with little valid justification for why VOLUME is kept? Do they remain open for visibility? Undecided? Or until consensus for all to make the switch together?

EDIT: I understand for these DB images:

The VOLUME paths are initially empty with the image and then populated at runtime for each container instance.
That you can provide a different location via a runtime option (--db-path, as has been communicated in above comments already) which still:
- Accumulates disk, but within the container instead of a separate volume.
- Redundantly create empty anonymous volumes.

Because it would be in a copy-on-write filesystem, the moment that you start a new container and MongoDB changes any of its files, it now uses twice as much space.

The concern for "twice as much space" seems moot as that's already what is happening implicitly? For MongoDB, this is 300MB+ per container instance created.

Prepopulated `VOLUME` (not applicable to mongo usage)

An implicit anonymous volume copies data from the image to the host per container instance created. This is wasteful and accumulates over runs (if not removing afterwards via --rm).

The example I link to is fairly simple:

Golang image builds CoreDNS.
VOLUME instruction used in Dockerfile for Go's package storage and build cache. This represents over 2GB of data.
Outcome:
- Each container instance that is run, implicitly copies that VOLUME declared data during container startup. This adds notable delay to container startup time as well.
- Stopping the containers does not release these anonymous volumes, and they are not easy to inspect (via Docker CLI, Docker Desktop is more informative about context).
- 4 of these containers is over 9GB of data to anonymous volumes, but would otherwise be none.

Use volumes for write-heavy workloads: Volumes provide the best and most predictable performance for write-heavy workloads. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write.

It should be opt-in. A container still persists internal state until it's destroyed/removed.

If a user wants to persist the data or have better performance, they should provide a volume explicitly?

Anonymous and named volumes will copy the content (if any) from the image like VOLUME, and since that is done so explicitly by the user there is no surprises.
Bind volume mounts have different semantics (no copy by default) but also worthwhile, and compatible with this image.

So while volumes are a best practice, I disagree about the VOLUME instruction (like this comment states, Dockerfile should be concerned with the image build, not how to manage runtime state).

Other known concerns with `VOLUME`

Extending a base image (that has a VOLUME instruction) does not support an opt-out.
- There's been an long-standing issue requesting support to opt-out/unset VOLUME, but it doesn't look hopeful to be resolved, it also appears to be more of a bandaid solution (that'll introduce other problems).
- A better suggestion was to deprecate the VOLUME instruction, since there isn't any really compelling reasons for it anymore as an instruction (beyond informational like EXPOSE).
- There's also a request for CLI / daemon opt-in support to ignore VOLUME from images when running a container.
BuildKit has a slight difference in behaviour for VOLUME at build time (1, 2), which may have been applicable to the original Dockerfile issue of this thread.
- Although that's more of a bugfix I still see users building projects with the legacy builder, so VOLUME usage with the fix may still result in some bug reports adding to maintainer burden that could otherwise be avoided.
When providing an explicit volume and the implicit VOLUME mounts a subpath:
- The anonymous volume has priority, it will source the image content and remain.
- Even if your explicit volume is a bind mount, the anonymous volume will replace the equivalent subpath with the image content. To prevent this you need to explicitly mount that subpath too.
- Why would this behaviour be desired for this image vs avoided by removing VOLUME? If a user is unaware, they may be led to think their explicit volume should have all data persisted, considering any anonymous volumes as disposable resulting in data loss.
A summary of an earlier investigation of mine on VOLUME (with plenty of references to justify VOLUME as a bad practice).

Docker Compose quirk

I did come across this reasoning by @yosifkit (2017) that Docker Compose will try to preserve the anonymous volume across image upgrades (verified as still applicable).

At a rough glance it seems tied to the service name and that the new image has the expected VOLUME <path> declared.

Any change to the image data the VOLUME would seed from within that image would be ignored.
--force-recreate can remove other internal container state, but won't discard/reset the implicit anonymous volume.

The feature is well intentioned, but I can see how an intentional breaking change update to the image could clash with this easily with

A new VOLUME introduced as a subpath within an existing VOLUME.
A subpath to a location that users had already been using explicit mounts on.
A developer iterating on a local image of their own with VOLUME, unaware of this compose specific "feature", potentially with some confused troubleshooting.

The main concern expressed by @yosifkit was that images which remove their VOLUME instruction(s) would result in data loss for compose.yaml users that rely on this implicit feature to persist their data. It's questionable that anyone relies on this when they really care about the data, is it advised/encouraged somewhere? Documented, or a hidden feature?

@yosifkit later notes in 2021 that this Docker Compose feature was implemented prior to proper external volume feature support, back when we only had anonymous volumes with the VOLUME instruction? (which changed in 2015)

LaurentGoderre · 2024-03-12T16:34:42Z

For the original use case, an init container would serve this much better. You could have an image with your data copied in and use mongorestore with the hostname of the other container. In that case you don't need to modify the default behavior and still maintain startup speed.

mildmojo · 2024-03-18T22:47:46Z

For the original use case, an init container would serve this much better.

This could work for some users. It seems best used with strong orchestration (e.g. docker compose) and a very small DB seed. Some drawbacks:

An init container could add a penalty of several minutes each time the DB container is recreated if your seed is more than a few hundred megs with a handful of collections and indexes, vs. seconds for an image with data baked in. This can feel really demoralizing during a dev/test loop.
A long initial import on container creation would mean the DB is up & responsive but in an invalid state until the restore is complete, which can cause other services to fail and require manual intervention after e.g. docker compose up.
Like using --dbpath in CMD, using an additional container means you have another asterisk, another step, to get a working database when you're not using orchestration or you're writing/refactoring orchestration config.

Baking DB data into your own mongo image makes it fast and simple to launch new database containers that are in a valid state from first boot, and the official mongo images don't work well as a base for that. And, it's time-consuming and difficult to figure out why your build-time data goes missing when you base your dockerfile on the official mongo images.

However, the comment polarathene linked seems to observe that buildkit treats VOLUME as EXPOSE-style advice only, and data written to volumes during build steps persists in the final image. So maybe this is... fixed by buildkit? 😶

tianon · 2024-03-18T23:11:45Z

However, the comment polarathene linked seems to observe that buildkit treats VOLUME as EXPOSE-style advice only, and data written to volumes during build steps persists in the final image. So maybe this is... fixed by buildkit? 😶

Yep!

FROM mongo:7.0
RUN touch /data/db/foo

$ docker buildx build .
...
#6 writing image sha256:1ca35025fc343c6e199f7abedad161821524f8374d78d474a923aab571d8473f done
#6 DONE 0.0s

$ docker run --rm sha256:1ca35025fc343c6e199f7abedad161821524f8374d78d474a923aab571d8473f ls -l /data/db/
total 0
-rw-r--r-- 1 root root 0 Mar 18 23:11 foo

yosifkit · 2025-03-03T22:36:28Z

Closing in favor of anonymous volume opt-out in Docker (moby/moby#43190). VOLUME is the way for image authors to designate which directories should be persisted when users want persistence.

wglambert added the Request Request for image modification or feature label Sep 28, 2018

wglambert mentioned this issue May 31, 2019

Volume-less flavours docker-library/cassandra#180

Closed

wglambert mentioned this issue May 19, 2020

VOLUME declaration can result in difficult to diagnose misbehavior docker-library/rabbitmq#410

Closed

wglambert mentioned this issue Feb 23, 2021

Building a Wordpress production image with a custom theme. docker-library/wordpress#567

Closed

polarathene mentioned this issue Feb 18, 2022

Remove VOLUME instructions caddyserver/caddy-docker#118

Closed

tianon mentioned this issue Mar 15, 2023

Declared volumes make deploying image as "pre-built" db extremely difficult #613

Closed

yosifkit closed this as completed Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should not declare VOLUME for /data/db #306

Should not declare VOLUME for /data/db #306

aparamon commented Sep 28, 2018 •

edited

Loading

yosifkit commented Oct 3, 2018

aparamon commented Oct 4, 2018 •

edited

Loading

chobolt commented Mar 2, 2020

mildmojo commented Jan 21, 2021 •

edited

Loading

tianon commented Jan 21, 2021 via email

mildmojo commented Jan 30, 2021

tianon commented Feb 6, 2021

mildmojo commented Feb 7, 2021

polarathene commented Mar 12, 2024 •

edited

Loading

LaurentGoderre commented Mar 12, 2024

mildmojo commented Mar 18, 2024

tianon commented Mar 18, 2024

yosifkit commented Mar 3, 2025

Should not declare VOLUME for /data/db #306

Should not declare VOLUME for /data/db #306

Comments

aparamon commented Sep 28, 2018 • edited Loading

yosifkit commented Oct 3, 2018

aparamon commented Oct 4, 2018 • edited Loading

chobolt commented Mar 2, 2020

mildmojo commented Jan 21, 2021 • edited Loading

tianon commented Jan 21, 2021 via email

mildmojo commented Jan 30, 2021

tianon commented Feb 6, 2021

mildmojo commented Feb 7, 2021

polarathene commented Mar 12, 2024 • edited Loading

Other known concerns with VOLUME

Docker Compose quirk

LaurentGoderre commented Mar 12, 2024

mildmojo commented Mar 18, 2024

tianon commented Mar 18, 2024

yosifkit commented Mar 3, 2025

aparamon commented Sep 28, 2018 •

edited

Loading

aparamon commented Oct 4, 2018 •

edited

Loading

mildmojo commented Jan 21, 2021 •

edited

Loading

polarathene commented Mar 12, 2024 •

edited

Loading

Other known concerns with `VOLUME`