-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClickHouse Keeper docker image and VOLUME directive #54767
Comments
Interesting topic. I'd say, the main reason of "why it's done like that" is "because every official image does it" I randomly opened mysql, postgres, rabbitmq and mongo. Every one has VOLUME and EXPOSE directive in it. Dropping the practice of making images working by default doesn't sound right to me. |
Yes and no. While I dislike *: you could argue for the latter, if keeper persists query-driven settings changes on disk by editing settings files on its own, I don't know if it's the case
That's one way to see it. For the main VOLUME (ie But for keeper in particular I'd argue:
Frankly, these being baked into images was not a "clever way to help newcomers" idea by the original developers at Docker. Like very many other historical features (there's a reason we're on the third major compose spec), it was a huge mistake. |
You can create a PR, and it's OK to discuss it there. I see the point, and it's a valid reason to eliminate them. In the end, we have only one directory as VOLUME in the server image https://github.com/ClickHouse/ClickHouse/blob/e5a1c0e/docker/server/Dockerfile.ubuntu#L133 |
Specifically the 2 non-data-related volumes Fixes issue ClickHouse#54767
…eeper-docker-volumes #54767 Remove extraneous volumes in Keeper image
In the same spirit as #48452, I come to inquire about the following line:
ClickHouse/docker/keeper/Dockerfile
Line 75 in 6564743
While I understand that it is a convenience choice to have image-defined VOLUMEs for the less technical users, this comes at rather high cost for everyone else.
Specifically, VOLUME directives have a few glaring issues.
1. They cannot be undone without building a brand new image from scratch
Yes this is a fault of the Dockerfile specification, but we all know that is never going to change just like plenty of other very "odd" behaviours... It's not for lack of people asking about it for more than a decade now either:
2. They bypass some security measures
At least in Kubernetes, while investigating how I was able to experience the issue described in #54760 at all, I ran into very surprising behaviour.
Indeed, I use the following k8s manifest snippet to grant as few permissions as possible to all containers I run in general:
And notably
readOnlyRootFilesystem: true
makes / mounted as read-only by default. Resulting in:So I expected keeper to not even be able to write its log files (it trying is the issue linked, but otherwise unrelated to this issue). And yet it was able to.
Alright, let's take a look, however unbelievable that seems:
Well alright, that is quite strange... Only additionally-mounted volumes (like I added for /var/lib/clickhouse for example) should be writable...
What's going on here? Let's see what is backing up that path I suppose?
Oh. Here, /dev/sda2 is the host OS' filesystem. So k8s manages a subdirectory in there somewhere which it mounts in the container as read-writable.
I haven't read through the entire container specification, but nonetheless Kubernetes decides (or is forced for compatibility/compliance, I don't know) to mount the host filesystem on the container, which here causes 2 significant issues:
This can be worked around once you know about it, by manually mounting a read-only volume there, but the default is extremely susprising as nothing indicates that this is going on.
3. They are a deceptively easily trap to fall for
Just like directives like EXPOSE 80 443 that used to be popular (hopefully less these days?), which result in susprises when you bind port 8080 in the container and suddenly the magic doesn't work anymore because port 8080 isn't part of the magically exposed ports.
Or you run with
-p 8080:8080
and are surprised to find port 80 still incomprehensibly bound.Here with volumes, you think you handled storage concerns over time, but you actually didn't at all.
And you'll keep being clueless until you have to investigate why your disks are filling up (or worse, filled up entirely).
I am not entirely certain of the backwards compatibility considerations here. And to be honest I fully expect that someone somewhere purposefully relies on it, and that the container spec or some runtimes do something dumb if it goes away that would result in losing user data.
But I'd really like to see considered not adding VOLUME, EXPOSE (and other if any) magic directives into base images in the future...
I would really like to keep using official images, not least because it's time-consuming to manage our own rebuilds, and make things like bug reporting 10x more annoying because maintainers will (rightfully, to be clear) ask for reproductions to use official images for example.
Also I'd note that the EXPOSE line is also problematic for largely the same reasons, but one fight at a time and the VOLUME one is much worse in practice.
The text was updated successfully, but these errors were encountered: