Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse Keeper docker image and VOLUME directive #54767

Closed
Tristan971 opened this issue Sep 18, 2023 · 3 comments · Fixed by #61683
Closed

ClickHouse Keeper docker image and VOLUME directive #54767

Tristan971 opened this issue Sep 18, 2023 · 3 comments · Fixed by #61683
Assignees

Comments

@Tristan971
Copy link
Contributor

Tristan971 commented Sep 18, 2023

In the same spirit as #48452, I come to inquire about the following line:

VOLUME /var/lib/clickhouse /var/log/clickhouse-keeper /etc/clickhouse-keeper

While I understand that it is a convenience choice to have image-defined VOLUMEs for the less technical users, this comes at rather high cost for everyone else.

Specifically, VOLUME directives have a few glaring issues.

1. They cannot be undone without building a brand new image from scratch

Yes this is a fault of the Dockerfile specification, but we all know that is never going to change just like plenty of other very "odd" behaviours... It's not for lack of people asking about it for more than a decade now either:

2. They bypass some security measures

At least in Kubernetes, while investigating how I was able to experience the issue described in #54760 at all, I ran into very surprising behaviour.

Indeed, I use the following k8s manifest snippet to grant as few permissions as possible to all containers I run in general:

securityContext:
  runAsNonRoot: true
  runAsUser: {{ clickhouse_keeper_securitycontext_uid }}
  runAsGroup: {{ clickhouse_keeper_securitycontext_gid }}
  readOnlyRootFilesystem: true
  capabilities:
    drop: [ all ]

And notably readOnlyRootFilesystem: true makes / mounted as read-only by default. Resulting in:

clickhouse-keeper-0:/var/log$ echo "foo" > bar
bash: bar: Read-only file system

So I expected keeper to not even be able to write its log files (it trying is the issue linked, but otherwise unrelated to this issue). And yet it was able to.

Alright, let's take a look, however unbelievable that seems:

clickhouse-keeper-0:/var/log$ echo "foo" > clickhouse-keeper/bar
clickhouse-keeper-0:/var/log$ cat clickhouse-keeper/bar
foo

Well alright, that is quite strange... Only additionally-mounted volumes (like I added for /var/lib/clickhouse for example) should be writable...

What's going on here? Let's see what is backing up that path I suppose?

clickhouse-keeper-0:/var/log$ mount -v | grep /var/log
/dev/sda2 on /var/log/clickhouse-keeper type ext4 (rw,relatime)

Oh. Here, /dev/sda2 is the host OS' filesystem. So k8s manages a subdirectory in there somewhere which it mounts in the container as read-writable.

I haven't read through the entire container specification, but nonetheless Kubernetes decides (or is forced for compatibility/compliance, I don't know) to mount the host filesystem on the container, which here causes 2 significant issues:

  1. I don't want writable paths in my containers if they're not necessary for operation of the software. This is a simple security measure that massively reduces attack surface all around
  2. I especially do not want them to be provided via the OS filesystem, whose space could be exhausted as a result if anything goes wrong with log file rotation for example

This can be worked around once you know about it, by manually mounting a read-only volume there, but the default is extremely susprising as nothing indicates that this is going on.

3. They are a deceptively easily trap to fall for

Just like directives like EXPOSE 80 443 that used to be popular (hopefully less these days?), which result in susprises when you bind port 8080 in the container and suddenly the magic doesn't work anymore because port 8080 isn't part of the magically exposed ports.
Or you run with -p 8080:8080 and are surprised to find port 80 still incomprehensibly bound.

Here with volumes, you think you handled storage concerns over time, but you actually didn't at all.
And you'll keep being clueless until you have to investigate why your disks are filling up (or worse, filled up entirely).


I am not entirely certain of the backwards compatibility considerations here. And to be honest I fully expect that someone somewhere purposefully relies on it, and that the container spec or some runtimes do something dumb if it goes away that would result in losing user data.

But I'd really like to see considered not adding VOLUME, EXPOSE (and other if any) magic directives into base images in the future...
I would really like to keep using official images, not least because it's time-consuming to manage our own rebuilds, and make things like bug reporting 10x more annoying because maintainers will (rightfully, to be clear) ask for reproductions to use official images for example.

Also I'd note that the EXPOSE line is also problematic for largely the same reasons, but one fight at a time and the VOLUME one is much worse in practice.

@Tristan971 Tristan971 added the question Question? label Sep 18, 2023
@Felixoid
Copy link
Member

Felixoid commented Mar 18, 2024

Interesting topic. I'd say, the main reason of "why it's done like that" is "because every official image does it"

I randomly opened mysql, postgres, rabbitmq and mongo.

Every one has VOLUME and EXPOSE directive in it.

Dropping the practice of making images working by default doesn't sound right to me.

@Tristan971
Copy link
Contributor Author

Tristan971 commented Mar 19, 2024

Every one has VOLUME and EXPOSE directive in it.

Yes and no.

While I dislike VOLUME in general, the main issue here is that /var/log/clickhouse-keeper and /etc/clickhouse-keeper* are volumes. Even though they aren't persistent data. You will notice that the mysql image only makes /var/lib/mysql a volume, not /var/log/mysql.

*: you could argue for the latter, if keeper persists query-driven settings changes on disk by editing settings files on its own, I don't know if it's the case

Dropping the practice of making images working by default doesn't sound right to me.

That's one way to see it.

For the main VOLUME (ie /var/lib/clickhouse-keeper) and EXPOSE I might even be inclined to agree, perhaps, for the likes of extremely widespread things like mysql/postgresql/etc.

But for keeper in particular I'd argue:

  • No one anywhere is running it as literally docker run clickhouse/clickhouse-keeper (ie without a -v or -p at all) outside of CI environments...
  • If it is convenience for people new to docker, providing a docker-compose with explicit volume/port declarations is still clearer, as they will have exactly 0 idea where the data goes and lose it all one day when they recreate the container and the new container makes new unnamed volumes
  • For experienced users, it causes all kinds of annoyances like port conflicts and, as reported in my issue, added liability of keeper's logs one day filling the host (and potentially causing an outage of said host) if we make a configuration mistake

Frankly, these being baked into images was not a "clever way to help newcomers" idea by the original developers at Docker. Like very many other historical features (there's a reason we're on the third major compose spec), it was a huge mistake.

@Felixoid
Copy link
Member

While I dislike VOLUME in general, the main issue here is that /var/log/clickhouse-keeper and /etc/clickhouse-keeper* are volumes.

You can create a PR, and it's OK to discuss it there. I see the point, and it's a valid reason to eliminate them.

In the end, we have only one directory as VOLUME in the server image https://github.com/ClickHouse/ClickHouse/blob/e5a1c0e/docker/server/Dockerfile.ubuntu#L133

Tristan971 added a commit to mangadex-pub/ClickHouse that referenced this issue Mar 24, 2024
Specifically the 2 non-data-related volumes

Fixes issue ClickHouse#54767
Felixoid added a commit that referenced this issue Mar 27, 2024
…eeper-docker-volumes

#54767 Remove extraneous volumes in Keeper image
robot-ch-test-poll2 added a commit that referenced this issue Mar 27, 2024
Backport #61683 to 24.1: #54767 Remove extraneous volumes in Keeper image
robot-ch-test-poll3 added a commit that referenced this issue Mar 27, 2024
Backport #61683 to 24.2: #54767 Remove extraneous volumes in Keeper image
Felixoid added a commit that referenced this issue Apr 10, 2024
Backport #61683 to 24.3: #54767 Remove extraneous volumes in Keeper image
Felixoid added a commit that referenced this issue Apr 10, 2024
Backport #61683 to 23.8: #54767 Remove extraneous volumes in Keeper image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants