-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't supervise the docker binary with systemd #64
Comments
That's right.. Unfortunately, we are not really supervising the actual containers, but just the Docker client. Still, having systemd "supervise" is useful because:
Docker is an implementation detail (admittedly, one we can't easily get away from - see the next section below). About option 1, I've thought about going for another container stack, but it seems like it won't be easy (or possible). I've thought about rkt before, but it lacked some features. Things may be better now - I haven't checked. Looking at podman, it doesn't seem like it's available for most distros, like you said. Even if it is, we need to check if it supports everything that we use (or if we can squeeze into whatever it supports). While running multiple containers and having them talk to one another should be possible with podman too, we also rely on some implementation details in Docker (embedded DNS resolver being at We'll also probably want to replace all So.. this seems like a hard problem to solve.. Despite this systemd + Docker supervision issue (which is ugly, but doesn't seem to cause trouble), Docker seems to help us more than it's getting in the way. I'm curious, what exactly is the problem with using this playbook to install on Gentoo, Alpine Linux, etc.? |
The thing is though, docker will do all the things anyway. When the process outputs something, it will be passed to the docker daemon, which routes the logs using it's internal logging code. While the logs aren't persisted by docker and just fetched by the rest client, we still have a very large amount of layers before any logs are written:
If we instead tel the container to use the syslog driver for logs, we can skip steps 4-6. The logs will be processed by docker anyway, the question is whether we use what docker offers. When a container dies, the docker daemon will take a look at it's restart policy and act accordingly. We don't set one, so this (hopefully) won't do anything, but docker will still look whether it's supposed to do it. Considering that we run everything in docker, using either docker_service or docker_container for running containers makes more sense IMO than wrapping it in another layer of systemd which doesn't give us any benefits.
It's that these distros don't use systemd. Gentoo and Alpine Linux use OpenRC, VoidLinux uses runit. Letting the docker engine manage the containers would fix this.
podman tries to be 100% compatible with the docker cli, to the point where just doing Podman isn't stable though, it does have the occasional bug (like docker but a bit more often). rkt isn't being continued, it was developed by CoreOS, which was bought by RedHat. RedHat has been working on podman for a while now, which supports nearly everything that rkt supports but is more modern and relies less on systemd, while it still works with systemd for those who want to use it. |
I see your point. It does make sense to use Docker and have support for those other (non-systemd) distros as well. Especially given that the name of the playbook is However, systemd is a available and the go-to way for managing services on all popular and major distros which one would want to use on a server (RedHat/CentOS, Debian, Ubuntu, ..).
By using It's a working abstraction that lets people manage their servers using standard tooling that they're already used to. If we let Docker manage all services, it means:
Supporting other distros would be nice though. Most are systemd powered, so those would be easy. Hmm.. I guess both of these solutions don't sound ideal yet.. |
My reason for using voidlinux is that it doesn't use systemd. More and more applications start to depend on systemd without the need to do so, just because it's available most of the time. If we ever want to migrate away from systemd, we will be stuck in hell, because everything breaks.
Yes, it works, but adding unnecessary layers that make the whole setup more fragile just so that you can use
Adding a guide where people can check commands to use for
You can tell docker to log using syslog, which can then go into journald or rsyslogd or simple files. You don't need to keep the logs in docker.
Docker supports service dependencies, else the services just fail and get restarted until their dependencies are up. |
Please also don’t forget that podman allows rootless containers and is therefore much more secure! While Docker container exploits might give attackers root privileges, that won’t be the case when podman is used. And as we’ve known, Matrix sadly is not known to be very much secured or hardened (neither do I know about an audit). Additionally, multimedia applications are known to be amongst the most vulnerable types of software. So I’d consider the usage of podman as a vast security enhancement. And you might also consider that pods allow even further possibilities like Kubernetes, rkt, etc. |
None of our containers run with a As far as I understand (which may be wrong), the benefit of "rootless containers" is that preparing and triggering container start can be done with a non-root user. I guess Docker containers running with a non-root user inside ( I think getting I like that podman doesn't require a daemon to run as is much more natural and would be happy if we could migrate to that. The problem with podman is:
|
https://github.com/containers/libpod/blob/master/install.md shows the supported distros. Only Debian is missing from those. CentOS has it available in their extras repository and Ubuntu has a PPA available here. You could theoretically write a bit of ansible to build and install podman on Debian, with the guide available here. |
About networking: |
@spantaleev Thanks for the detailed explanation! Seems like not as bad as I thought. Regarding the distros: Why does anyone want to run podman in Alpine? Isn’t that just an OS image for containers? And I also would never target Gentoo for server software. 😅️ CentOS 8 will have podman installed by default, as far as I know. Regarding the network: All containers in a pod are implicitly connected to each other, so that you can use Here is a Nextcloud setup script I wrote yesterday and it runs fine. The Postgres port is not visible to the outside (at least I hope so very much), only via the published pod port 8080 you’re able to connect to the Apache service inside the Nextcloud container, which runs on port 80.
|
No, it's a Linux Distro. The most popular place where it's being deployed is probably as a container image, but you can very well use it outside of containers too.
That would mean that all containers need to be in the same pod though, which is not great. |
You should be able to connect several pods to each other as well as to other containers. |
Yes, sure, but then why merge those containers into a pod anyway? |
You don't need pods to connect containers to each other |
Why have data structures when you can have all variables separate on their own? Answer is encapsulation, logical grouping (especially if you have a lot of containers or more instances of the same service) and managability. That’s also the reason why Docker compose exists. Why should someone do things complicated? |
Yes, pods do make sense for some usecases, but where do they make sense for this specific case? |
Hi all! It was a good read! Any chance on continuing the discussion?
Just as @fbruetting said, for encapsulation. Maybe if you run a matrix server instance, and nothing else, bare containers are fine. But this is only an isolated project in theory, and mostly not isolated if deployed. See, the docs of this project is so good, that it anticipates you have other stuff hosted at the same server... In an ideal world, I would run both of them in separate pods, and be able to, i.e.: upgrade matrix without creating mailserver downtime, and vica-versa. Regarding systemd. |
You can do systemd containers with systemd-nspawnd, nsbox already uses that for example. 😛 If you can go for containers, please forget installing on the host. There are no brakes on the hype train for a reason! 😄 |
As a proof of concept, I wrangled the ansible playbook into generating a docker-compose.yml instead, and it seems to work just fine. (The process of wrangling, however, is not fine at all -- the following process is an extreme hack based on me not knowing anything about ansible or docker-compose when I started.) It does just the basic setup, with none of the optional additions (I haven't tried, maybe they would work?). Perhaps someone more keyed in than me can adapt this more cleanly. The key points:
You'll need the python3 script attached. Set up your matrix user and groups on your system on your own. Make sure you include those in the host_vars file, along with the path that will contain all the generated config files Then the step by step procedure:
Everything above can in theory be done on any computer, then all you actually need are the files in
That should be it! You should be able to control your install using docker-compose. Add a user using |
I have one main request that would make this process a lot less hackish, that I can't really figure out on my own easily. Can we have separate tags for the tasks that configure the host vs tasks that configure the containers? Then we can only run the tag that handles the containers, along with the .service -> .yml switch, and then the containers can be managed however people want. Of course, this means users and cron (and anything else?) would not be automatically configured. |
Incidentally, with #418 this gets a lot cleaner, with two helper python scripts:
and now you have your |
Wow, you've spent a lot of time on this! Happy to hear you're finding an alternative way to make use of this playbook! Hopefully others will find it useful as well. If it proves useful to others alike, in the future we could probably even add your scripts to the playbook and introduce some new tags to help run them. |
I just entered the issue here again. I have the issue that I need to run multiple instances on one machine. Using a pure container-based approach (without systemd) is a requirement as the systemd scripts are overwritten otherwise. I consider the best option (in my current position and given my current knowledge) to be using My personal suggestion is to start a new branch here in the repo that gets rid of the systemd stuff completely. One could suggest to migrate to the new structure as soon as it is stable enough. Then a legacy code base could be kept in a separate branch and the newly branch can be renamed to Normally, I would start a fork migrating the current state to the docker-compose version. However, I would like to know from you @spantaleev if this is something you would be comfortable with. Otherwise it will consume quite some work and be one of many stalled and soulless open source forks/projects. To sum up things: I would migrate the current systemd-based approach to a docker-composed one. The logging would be done using journald to be compatible as much as possible. Some documentation can be added if the need is there how to start/stop a service using docker-compose (however this should not be necessary as we are using Ansible anyways, right?). |
systemd supports instantiation, e.g. systemd (with systemctl, journalctl) is standard tooling on most linux distributions. System administrators should be able to keep using the standard tools as much as possible. You still need standard tooling, e.g. cron. Or how would you replace those with docker-compose? Standard tooling is less complex than standard tooling + docker-compose. -1 for docker-compose |
I also think that adjusting the existing systemd setup in a way that's part of the playbook would probably be better (and won't need any maintenance, once done). When going with systemd, besides instantiation, we can also go for service prefixing/suffixing. Similarly, we'd need to figure out some other things (cronjobs conflicting, etc.) As for reverse proxying, a multi-instance installation would probably need to disable |
The idea of the using the instantiation feature of systemd is a nice idea. I had not thought of that yet. What else (except for cron) do we need? If doing it the real container-based way, cron will run in a container, too. So no dependencies on the host needed. The main point in this whole discussion is the fact that systemd assumes to control a process while with docker involved, it only controls the client process. Thus systemd has no clue about the fact that the "real business" is happening somewhere completely different. This causes a whole bunch of drawbacks (see above). Therefore I suggested to drop the systemd requirement in the long-term and use some other management tool instead. If we want to stick with docker (as the name of the repo suggests), it should be something that is capable of using docker features. Here, systemd is not fitting very well. I know docker-compose is fitting. I have not used podman yet. My quick research lead to the impression, that it's focus is to run a container as a non-privileged user (on the host) faking root privileges in the containers. This is nice from the perspective of a developer who do not need root access to test something out. In fact, as far as I understand things, docker-compose can be converted comparably easyly to pod descriptions. So, a first step towards #520 might in fact be the generation of valid docker-compose files. I see that docker is not the only toolset regarding containers but currently it is at least one of the most used ones. If in a later stage dockerd is replaced by some other toolset, I am not against this in general. However, this is better to be discussed in #520 or similar. I would say that users of an ansible script are no simple script kiddies that just hack in If you really fear that a completely learning-resistant admin needs systemd by all means, you could add small service files that call docker-compose accordingly in a |
Docker seems to have a rootless mode nowadays. https://docs.docker.com/engine/security/rootless/ |
This is a deal breaker for me, I don't want a dozen systemd services on my system. I'm using alpine for my container host as well, most containers use alpine anyway, so no need to add the additional heavy stuff normal distros come with. I could use another distro, but I don't really want to manage this this way. |
You can work on adding openrc service scripts to the playbook. They could work similarly to how our systemd |
Yeah well, I would prefer to just create containers with Podman. Might look into that but could require a fork to be viable |
Podman with the Someone in our Matrix room said they'd be experimenting with that soon, so the playbook may be getting Podman support. Still, I don't see us switching away from using a service manager to start the containers. |
Yeah then I might end up forking it and just replacing the systemd stuff with podman_container for my personal deployment purposes |
I'd been sort of meaning to leave a more detailed writeup here at some point, but since this issue has been getting a bit of traffic again, I may as well brain dump a high-level overview of how I successfully used the playbook in this repo to get a systemd-less Matrix setup. Last January, I used @mooomooo's Python script to spin up a Matrix server managed entirely through a docker-compose file, and it's been running great ever since. I can try to go back and refresh my memory to provide more details if someone would like, but here's what I remember now: The Python script needed some small tweaks to get it working since it hasn't been kept up to date with this repo, but I seem to recall they were pretty obvious fixes; I'm not a super strong Python dev and I didn't have any trouble turning error messages into fixes. I assume in the year that's passed since I did any of this, more changes will probably be needed. I ran the process in a local throwaway VM completely isolated from the target server. At no point did Ansible touch that server, nor does that server have any systemd dependencies. It was theorized earlier that this method was probably possible, but I don't think anyone ever confirmed that they'd gotten it to work previously. The process I used looked basically like this:
The only extra step I had to take that wasn't mentioned in the instructions for the Python conversion script was that very last step. The Ansible playbook expects to be able to initialize databases itself, since it assumes it can talk to the database when it runs. To be fair, it can, it's just talking to a database that's about to be blown away. In a proper Docker setup, these steps would normally be run by the container that needs them on its first boot, but since I only needed them this one time, I just exec'd a shell in the postgres container and stuck them in by hand. You can probably avoid the DB init issue by running the Ansible playbook directly on the server you're deploying to, as was its original intention. I wanted to host Matrix on a server that was already running some other important stuff, and I didn't take the time to read and understand this playbook enough to be fully confident that it wasn't going to do anything disruptive (messing with my existing Docker installation, creating new system users, restarting services, etc) and I didn't want to take any chances. Over the past year, I've added services and upgraded others entirely inside my To each their own I guess, but I probably wouldn't be running Matrix today if I had to manage it through systemd services (or any other init system, really) and use Ansible to manage component versions. Being able to manage the entire stack (runtime, process monitoring, logs, networking, updates, etc) with a single tool that I'm already comfortable with is way easier as far as I'm concerned. For example, if I'm not getting all my logs as plain flat files in |
Thanks for the writeup! We try not to contaminate the system too much. We have variables that can be toggled to prevent the playbook from installing Docker ( We only create a We currently contaminate This is configurable using I suppose you can automate your setup by:
|
May I suggest a variant of this approach that might suffice for all users (hopefully): We could create a The benefit would be that
I would be willing to give this a try if it might feasible and acceptable for most users. However I am not willing to rework the playbook and get a "No, we do not want docker-compose at all" answer. |
How so? We still need to use The Docker-Compose situation is somewhat messy. It's one more thing to install. Then there's v1 and v2, as well as compose-switch which exposes Compose v2 (the Having Docker act as a service manager is somewhat ugly.. it doesn't seem like it should do that. Just like systemd should not be supervising From what I've heard, podman-compose does not work well (yet). |
I had experienced sometimes problems with not-completely started instances as some containers failed to start (or were up too early, IDK). I had to issue
That is true, that it means one more installation requirement. It is installed by ansible thus no big problem fo the end user. Or are you concerned about breaking changes in the future?
I think we have different notions of what which program is doing. Why would you call docker a service manager in this setup? When docker starts, it restores the states of all containers running before shutting down (appropriate container configuration assumed). So, if there are a few containers running just before shutdown, the docker daemon will spawn such containers upon restarting. Why do you call it a service manager now? A container manager, yes, that is its inherent task.
I did not look into podman. I am a happy user of docker and docker-compose for various services I am running. I just heard it popping up now and then here and wanted to pick up the topic and offer an option here as well. |
What are the limitations of sharing a docker-compose.yml file or template? why is a python script needed or ansible? |
@Dima-Kal ansible isn't installed on the server, it's installed on a machine that uses ssh to access the server. But this whole project is an ansible playbook, so getting rid of that makes no sense. You should probably start a new project if you want to go without ansible. |
I think the your differentiation is a bit arbitrary. AFAIC the better version would be without docker at all. But the distribution fragmentation makes it much more difficult to write a multi-distro playbook (I had to add like 4 repos to OpenSuse leap 15.3 in a non-docker matrix deployment), so thanks for @spantaleev and all the contributors! |
Hey, |
It's great to hear about your software and I'm sorry that us not being based on docker-compose natively makes things harder for you! Some attempts at giving answers are below:
This playbook currently manages about 100 components - most of which optional. All of these services are possibly interconnected and wired together in dynamic ways - something that a static Replacing the whole playbok with a huge "static" docker-compose file wouldn't work.
This may be an alternative. Have Ansible do all it needs to do, but then generate one huge This adds an extra dependency on docker-compose. Thankfully, it's not awful Python software anymore, so such a dependency is not so terrible. Generating the For this reason, Yes, the playbook could wire various services into a huge "docker compose configuration" variable in a similar way to how services are injected into variables like: Right now, disabling a service allows each role to clean up after itself as it wishes (stopping its systemd services, etc). If roles just inject services into So, it may be possible and it may be similar to what we do now, but.. it's a different way of doing things.. And we consider it an uglier and "less native" way (at least on systemd-based distros).
Each role in the playbook could start and stop containers and have Docker manage containers and auto-restarts, etc. There are a few problems with this:
I also don't see any we need to have Docker (or docker-compose) supervise services and dependencies between them, when the host already has a much better system for doing that - systemd. Side-story: Yes, systemd cannot really supervise the container process it starts, because Docker itself cannot manage service dependencies from what I know. It's just Docker Compose with its Relying on just Docker and systemd allows us to nicely support all distros which are based on these 2 technologies, which covers 99.99% of people. Some niche distros do not meet this criteria, unfortunately, but.. we can't support everything. Also, using just Docker + systemd, possibly allows us to also support Podman as an alternative (see #520). It's still a pipe dream, but it may happen some day. However, if we add Docker Compose into the mix, we'd need to hope that Podman would also play nicely with our "compose file" via podman-compose (or whatever alternative they are trying to build). In the end, there's always tradeoffs and people/setups that get excluded. Some wish to run on a distro without systemd.. Others wish to run with Podman.. Others wish to use docker-compose. Others wish to run on Kubernetes.. Others wish to run on HashiCorp Nomad. Others hate Ansible and would rather this tool were written in something else. One project cannot possible accommodate everyone. The fact that this Ansible playbook is currently the most popular deployment choice is proof that for most people, the tradeoffs were made correctly:
If the majority of people's requirements were different, this playbook would have been dead and something else would have taken off in its place. That said, I'm not against this playbook trying to accommodate some of these other communities and requirements. It's just.. a difficult problem to support everything amd most of us have no incentive to work on it. |
The docker binary is just a rest client that is talking to the docker daemon, which means that you aren't supervising the services but just the docker binary. I'm not sure what the reason for this is, but it means you can't use this playbook on alpinelinux, voidlinux, gentoo and possibly more.
I have multiple suggestions on how to solve this:
I strongly prefer option 1 for my usecases, but since podman isn't available to most users, that probably won't be possible.
The text was updated successfully, but these errors were encountered: