forked from gcgarner/IOTstack
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The problem of non-well-behaved containers (eg Mosquitto) #331
Comments
Paraphraser
added a commit
to Paraphraser/IOTstack
that referenced
this issue
May 21, 2021
Implements discussion contained in [Issue 331](SensorsIot#331): 1. Removes `build.py`. No special actions needed at build time. 2. Removes `directoryfix.sh`. No longer appropriate. 3. Removes `terminal.sh`. No longer mentioned in revised documentation (is unnecessary). 4. Adds Dockerfile to template folder. 5. Adds docker-entrypoint.sh to template folder (customised version of original). 6. Adds iotstack_defaults folder structure to template folder. 7. Moves filter.acl and mosquitto.conf into iotstack_defaults/config/ 8. Alters service.yml: * Builds from Dockerfile. * Adds environment key and TZ default. * Moves /mosquitto/config mapping from services to volumes and omits ":ro" flag Wholesale rewrite of Mosquitto documentation to cover: * Significant directories and files. * How Mosquitto gets built for IOTstack. * Discussion of migration from non-Dockerfile version to Dockerfile-based version. * Logging (some changes). * Security (major rewrite, including how to test security) * How to upgrade Mosquitto (now that it is built from Dockerfile) * How to pin Mosquitto to a particular version. * Port 9001 (some changes).
Paraphraser
added a commit
to Paraphraser/IOTstack
that referenced
this issue
May 21, 2021
Implements discussion contained in [Issue 331](SensorsIot#331): 1. Removes `directoryfix.sh`. No longer appropriate. 2. Adds Dockerfile to template folder. 3. Adds docker-entrypoint.sh to template folder (customised version of original). 4. Adds iotstack_defaults folder structure to template folder. 5. Moves filter.acl and mosquitto.conf into iotstack_defaults/config/ 6. Alters service.yml: * Builds from Dockerfile. * Adds environment key and TZ default. * Moves /mosquitto/config mapping from services to volumes and omits ":ro" flag Does not alter documentation on old-menu branch. Does not remove terminal.sh (mentioned in original old-menu documentation). Both approaches to password creation "work". I decided to leave terminal.sh to minimise the risk of confusion if someone was following the old-menu documentation and was unable to find terminal.sh.
This was referenced Aug 22, 2021
Open
Closed
Open
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The problem of non-well-behaved containers (eg Mosquitto)
This is a long post and I apologise for that in advance. The problem it discusses is the underlying cause of many Discord questions. The proposed solution will, I hope, become a model for all non-well-behaved containers and serve IOTstack users well into the future. I'd appreciate feedback.
The post:
What's a non-well-behaved container?
This is best understood by considering well-behaved containers like:
You can reset any of those containers to "factory fresh" by taking them down, erasing their persistent storage area, and bringing them up again. For example, suppose you want to reset Node-RED:
During the "up",
docker-compose
re-creates./volumes/nodered/data
, and then the container automatically re-creates all the necessary default structures. This is the end result:Most importantly, Node-RED is ready for business. It displays its web GUI and lets you start creating flows.
The other containers in the well-behaved list have similar properties. Thus, a well-behaved container is one which:
A non-well-behaved container, therefore, is one which:
The
directoryfix.sh
band-aidIOTstack tries to deal with non-well-behaved containers through the
directoryfix.sh
mechanism (sometimes renamedbuild.sh
in new menu). In fact, the presence of adirectoryfix.sh
in a container's template can be a strong indication that a non-well-behaved container lurks within.The basic idea is that the menu system runs
directoryfix.sh
at build time. Note the emphasis.The problem, then, is what happens if a non-well-behaved container's environment changes after
directoryfix.sh
has done its work? The answer is, "you get a mess".Messes lead to unhappy users, issues and Discord questions. IOTstackers deserve better and the purpose of this issue is to explore how to turn non-well-behaved containers into well-behaved containers.
Users can run a container's
directoryfix.sh
manually but they have to:Running
directoryfix.sh
properly means:The approach many first-time users seem to take is:
That results in a cure-worse-than-the-disease kind of mess because
directoryfix.sh
scripts rely on relative paths and take their actions usingsudo
.Any
directoryfix.sh
can always be improved but it would still suffer from the basic problem that it is only ever run by the menu or when the user "just knows" to run it by hand.A case study: Mosquitto
Mosquitto is quite a good example of a non-well-behaved container.
Defining problems is the first step in solving problems so here is a list of specific problems, as I see them:
Multiple identity disorder
IOTstack's
service.yml
for Mosquitto defines:This causes
docker-compose
to spin up the container as userID 1883. Within the container, userID 1883 is the user "mosquitto".The
docker-entrypoint.sh
script is how the container starts running:It is not until after the
exec
statement that Mosquitto (the process) unconditionally downgrades its own privileges to userID 1883.I think that the design intention of the
docker-entrypoint.sh
script is clear. The script expects the container to be launched as root so that it can ensure that the persistent storage directory and everything below it is owned by mosquitto:mosquitto (ie 1883:1883 outside the container) before the mosquitto process is launched and downgrades its privileges.The presence of the
user: "1883"
directive in IOTstack'sservice.yml
template prevents the self-repair code from ever being executed.That's a problem, albeit one that is fairly easy to fix.
Too many cooks spoil the persistent-storage area
Pull Requests 274 and 275, changed Mosquitto's
volumes
definitions to:The definitions can be split into two groups:
the volumes paths (the three subdirectories of
./volumes/mosquitto
):If
directoryfix.sh
is run before the container is brought up then the volumes paths will be owned by userID 1883 (correct). In other words,directoryfix.sh
is simply doing whatdocker-entrypoint.sh
would do, were it not for the presence of theuser: "1883"
directive indocker-compose.yml
.If
directoryfix.sh
is not run before the container is brought up and any of the volumes paths is not present, then the missing directories will be auto-created bydocker-compose
and will owned by root (incorrect).docker-entrypoint.sh
would be able to fix this, were it not for the presence of theuser: "1883"
directive. The result is Mosquitto running inside the container having insufficient privileges and that usually leads to a restart loop.the services path
./services/mosquitto
:directoryfix.sh
doesn't touch this so that script is not relevant to this part of the discussion.If the services path is present when the container is brought up then it will be owned by userID 1000 ("pi:pi") with mode 644, and will therefore be read-only to Mosquitto running inside the container. This is not a problem, per-se, because Mosquitto does not try to write to the services path or its contents.
If the services path is absent when the container is brought up,
docker-compose
automatically creates the directory at the missing path, with root ownership. Again, this is not a problem, per-se, because Mosquitto does not try to write to the services path or its contents. It can, however, cause user confusion because of the subsequent need to usesudo
when working with that directory.Irrespective of whether the services path is owned by "pi" or "root", Mosquitto expects it to contain:
mosquitto.conf
(required)filter.acl
(required if filtering enabled)Mosquitto will go into a restart loop if:
mosquitto.conf
is missing; ormosquitto.conf
is present and enables filtering butfilter.acl
is missing.The pwfile chicken-and-egg
The default
mosquitto.conf
contains these two lines:Of the four possible combinations of those two lines, the above is the only combination that actually works with an out-of-the-box installation by the IOTstack menu. If the user:
password_file
line but doesn't provide apwfile
(even just atouch pwfile
) then Mosquitto goes into a restart loop, This occurs irrespective of whetherallow_anonymous
is true or false.allow_anonymous
(so it defaults tofalse
) withpassword_file
disabled, then Mosquitto will start but it won't accept any connections.The user just has to "know" to create a password file before changing
mosquitto.conf
. This is somewhat counterintuitive and, judging by issues and Discord questions, is a frequent source of frustration and confusion.The obvious solution is for "something" to ensure that there is at least an empty
pwfile
at the path:At the moment, that "something" is limited to the menu or
directoryfix.sh
. But there is a better way. Keep reading.Other
directoryfix.sh
quirksAt the moment, Mosquitto's
directoryfix.sh
:Checks for the presence of:
If it is not present, it is created with root ownership.
The next step depends on the presence and contents of:
If that file:
is present and contains
user: "1883"
(even if the directive is commented-out) thendirectoryfix.sh
ensures that the following directories exist:and finishes off by unconditionally changing the ownership of the
./volumes/mosquitto
hierarchy to 1883:1883 (correct).is present and contains
user: "0"
(even if the directive is commented-out) thendirectoryfix.sh
will unconditionally change the ownership of the./volumes/mosquitto
hierarchy to be root:root (incorrect).Note:
directoryfix.sh
does not ensure thatdata
,log
andpwfile
exist.otherwise,
directoryfix.sh
performs no additional actions.What happens after
directoryfix.sh
finishes depends on what is indocker-compose.yml
. Either:docker-entrypoint.sh
will fix the ownership to 1883:1883; oruser:
directive nominating a non-zero userID will cause the container to launch as that user, in which casedocker-entrypoint.sh
will do nothing and Mosquitto is then at risk of going into a restart loop because of permission conflicts.On a standard IOTstack install,
docker-compose.yml
containsuser: 1883
so the second option is the more likely.Also, while it will generally be the case on a fresh install that the Mosquitto definition in
service.yml
will be the same as the definition indocker-compose.yml
, there are no guarantees that it will stay that way. Nothing keeps the two files in sync with each other.Summary
Drawing these threads together:
The multiple identity disorder problem means:
user: "1883"
was removed, the self-repair startup code limits itself to fixing ownership. It takes no responsibility for ensuring the presence of sensible defaults formosquitto.conf
,filter.acl
orpwfile
.The too many cooks problem means there is a wide variety of potential outcomes, many of which can lead to a restart loop, all depending on what existed before the container started, who owned what, when
directoryfix.sh
was last run, and whether Mosquitto'sservice.yml
matches what is indocker-compose.yml
.The pwfile chicken-and-egg can easily lead to either a restart loop or Mosquitto rejecting valid traffic.
directoryfix.sh
:Basically, there are a wide variety of situations where the Mosquitto container will go into a restart loop if the ground is not prepared properly before it is brought up. The whole thing is reminiscent of one of those semi-humorous flow charts that always wind up a few consonants shy of "bucked up!"
A Dockerfile solution
Turning non-well-behaved containers into well-behaved containers can be done with Dockerfiles.
The files described in the rest of this section are all located in:
The rationale for placing everything in
.templates
rather thanservices
is that the former is the factory for all IOTstack implementations, while the latter is a customisation point for each user's implementation. This is a "factory" solution.service.yml
Changes:
eclipse-mosquitto
image from DockerHub, the service expects to find a Dockerfile located in.templates
.user:
directive is removed. This implies thatdocker-compose
will launch the container as root, meaning thatdocker-entrypoint.sh
will have the privileges it needs to perform auto-repair functions.Dockerfile
Actions:
Starts with the "eclipse-mosquitto:latest" image on Dockerhub.
Adds two packages to the container:
rsync
is needed because thecp
command provided with Alpine Linux does not implement the-n
aka "no clobber" option which can be used to replace missing files while not overwriting existing files.tzdata
causes the container to respect the "TZ" environment variable and display log timestamps in local time.Copies the contents of an
iotstack_defaults
directory structure into the image. The assumed structure is documented below. The file permissions are set in the template and persist into the image.Replaces the existing
docker-entrypoint.sh
with a revised version.The Dockerfile for the Mosquitto base image declares
/mosquitto/data
and/mosquitto/log
. This declares the paths added for IOTstack and causesdocker-compose
to treat those paths identically to those declared in the Mosquitto base image.iotstack_defaults
iotstack_defaults
is a directory with the structure shown below. It is copied "as is" into the image by the Dockerfile.This structure is inherently extensible. Any directory and/or file combination needed to establish sensible working defaults for Mosquitto can be added to the IOTstack repository and it will "just work".
iotstack_defaults/config/filter.acl
No change in content from the current template.
iotstack_defaults/config/mosquitto.conf
Content as per PRs 274 and 275:
iotstack_defaults/pwfile/pwfile
An empty file. Its presence (even if empty) means that the user can utilise all four combinations of
password_file
andallow_anonymous
, before defining any passwords, without any risk of creating a restart loop.docker-entrypoint.sh
The original was documented above. This is my proposed version:
The changes are:
/mosquitto
exist?" tests into a single statement. In practice,docker-compose
guarantees the presence of/mosquitto
so whether the container was launched as root is the only material consideration. The omission of theuser: "1883"
directive fromservice.yml
(and, by inference,docker-compose.yml
) means that the "if" test will always succeed in the IOTstack environment.rsync
command. This performs all auto-repair each time the container is launched. Any directory or file that has gone missing will be recreated automatically, with the correct permissions. The--ignore-existing
guarantees that nothing will be overwritten.Correct 1883:1883 ownership is enforced throughout the
/mosquitto
hierarchy by thechown
statement.As mentioned before, the
exec
launches Mosquitto (the process) over the top ofdocker-entrypoint.sh
and one of its first actions is to change its privileges to userID 1883. So, while the container starts as root, it runs as userID 1883, exactly as before.directoryfix.sh
Changed to be a single-use migration tool. If the user performs a
git pull
and then runs the menu, the menu will rundirectoryfix.sh
. At that point, the expected situation is:will not exist, but these files will exist:
Given that situation, the correct migration behaviour is to create the
config
directory and move the existingmosquitto.conf
andfilter.acl
files into that directory, thus preserving any existing configuration.directoryfix.sh
will not overwritemosquitto.conf
and/orfilter.acl
if those already exist inconfig
. In that situation, both source and target files are preserved to make it easy for the user to compare and cherry-pick.Nothing happens if the user does a
git pull
but then does not do anything to cause the new service definition to be put into production. The old service definition (with all its problems) and the directories and files to which it refers, will continue to work.Tests
Simulate first run of the new service definition and Dockerfile structure
The starting position is:
Mosquitto not running:
Persistent storage area erased:
Add markers added to
mosquitto.conf
andfilter.acl
in./services/mosquitto
:Simulate running of
directoryfix.sh
by the menu:Show effect of running
directoryfix.sh
:Mosquitto is now in "post-menu" state. Note the mixture of "pi" and "root" ownerships.
Show that the container starts correctly - post migration
Bring up Mosquitto:
Show Mosquitto is running:
Show the combined effect of the earlier
directoryfix.sh
and launching the container:Note:
docker-entrypoint.sh
each time Mosquitto launches or restarts. There is no need fordirectoryfix.sh
to make this happen.Confirm that the migrated
mosquitto.conf
andfilter.acl
were not overwritten by the self-repair process when the container launched:The markers are still in place so those are the files that came from
~IOTstack/services
. Take the container down:Create a "clean slate"
Erase the persistent storage area:
Mosquitto is now in "never run" state.
Bring the container up:
Show Mosquitto is running:
Show that the persistent storage area has been re-initialised correctly:
Confirm that
mosquitto.conf
andfilter.acl
are no longer the migrated versions:Restart Mosquitto (which causes Mosquitto to flush its internal database to disk):
Confirm that the database has been saved:
Turn on password handling
Enable the pwfile in the config:
Restart the container, wait 10 seconds, show container is running:
Note:
Show that a password can be created:
Notes:
Previously, the user either had to create a null password file before running this command, or pass the
-c
option to tell Mosquitto to create the file. Now, there is no difference between how the first password is created, and how second and subsequent passwords are created.Guaranteeing the presence of a
pwfile
, even if it is an empty file, means Mosquitto will always start properly, irrespective of whetherallow_anonymous
is true or false. Ifallow_anonymous
is:true
and credentials are:false
and credentials are:Turn on logging to disk
Enable logging to disk in the config:
Show that the log is now being written:
End of tests.
Wrapping it up
The "multiple identity disorder" goes away if the container is allowed to start as root. That gives it sufficient privileges for the self-repair code to run before the Mosquitto process downgrades its privileges.
The "too many cooks spoil the persistent-storage area" goes away if all the paths are under
~/IOTstack/volumes/mosquitto
and all are declared, properly, during the Dockerfile run, so that Docker handles them consistently.The "pwfile chicken-and-egg" problem goes away if the self-repair code is permitted to guarantee that the
pwfile
is always present.If there is no need for a
directoryfix.sh
performing repair functions (as distinct from a one-time migration function), there can be no "quirks" and no need for users to remember when or how to run it. Even ifdirectoryfix.sh
doesn't run when it should in its role as a migration aid, the container will still start and won't go into a restart loop.The next step
I've been running Mosquitto like this for the last few months, on three RPi4s and one 3B+. It just works. In terms of getting updates, it's more like Node-RED and requires a conscious decision:
Given the chaos that was caused when Mosquitto was updated to require the
listener
andallow_anonymous
directives, I'd say a bit more friction in the update process was no bad thing.The next step is a Pull Request to implement these changes for Mosquitto.
In the meantime, I'd appreciate feedback on this proposal.
The text was updated successfully, but these errors were encountered: