Support for Kubernetes, ephemeral OTOBO containers #1148

bschmalhofer · 2021-07-26T14:55:22Z

bschmalhofer
Jul 26, 2021
Collaborator

The current Docker support for OTOBO 10.1 relies on the Docker volume /opt/otobo. This volume provides persistence to the Docker-based OTOBO installation. Implicitly the volume is also a means for communicating events, e.g. using the "mtime of a file has changed" mechanism. However things are different when OTOBO runs under Kubernetes or another high availablity setup. In Kubernetes the default is to have ephemeral PODs, where /opt/otobo is local to each POD.

In this discussion we investigate what has to considered in an high availability setup. Specifically, what it means when /opt/otobo is not shared accross nodes. The alternative approach, to use a shared /opt/otobo, has been discarded. It has been tried with glusterfs but there were severe performance problems.

Recap of Kubernetes Basics

Just some definitions.

K8S: abbreviation for Kubernetes
Container: in our case a Docker container, gernerally anything conforming to the Container Runtime Interface
Node: can be seen as a single virtual machine
pod: a group of containers working closely together, running on the same node. In our case usually only a single container
side car: Beiwagen in German, container with added functionality for the main container of the pod, e.g. debug logging
persistent volume: something like a NFS share mounted to countainers
ObjectBucketClaim: equivalent of persistent volumn claims, only for object storage, where an object is basically a file
Edge Service: a service that communicates with the outside world, e.g. nginx doing SSL termination
S3 compatible storage: storage like provided by MinIO that conforms to the S3 web interface
AWS SNS: simple notification service

Doing away with the shared and persistent file system

If the data in /opt/otobo were static, then there be no need for sharing between pods. So, let's start by making a list of things that are dynamic in /opt/otobo.

Essential features:

Articles and Attachments
Dynamic configurations in Kernel/Config/Files : ZZZAAuto.pm, ZZZACL.pm, ZZZProcessManagemenft.pm
CSS cache files created by Kernel::System::Loader
JavaScript cache files created by Kernel::System::;Loader
OTOBO Packages writing to different locations:
- additional SysConfig entries in Kernel/Config/Files/XML/*.xml
- resources in var
- .pm files in Custom
- .pm files in Kernel
- var/packagesetup/xxx.pm can do everything...
- Autoload modules in Kernel/Autoload
Logfiles, Logging to a Logserver
File Kernel/Config.pm is changed by installer.pl
File Kernel/Config.pm is meant to contain NodeID for distinguishing instances of the Daemon
Updating when a container starts up. not implemented yet

Non-essential features:

User specific config files Kernel/Config/Files/User/$User.pm
S/MIME and PGP certificates
there is at least on lockfile used by the migration from OTRS
Message of the day in Kernel/Output/HTML/Templates/Standard/Motd.tt
The performance log as available via the AdminPerformanceLog frontend
Instances where the Cache is used for communication between Daemon and GUI, e.g. DaemonIsRunning
Config setting change in test scripts does not work in clustered environments
Custom translation files are not distributed
Kernel::System::Web::UploadCache::FS writes to local dir /opt/otobo/var/tmp, which is not shared
Support Data Collector writes to local dir /opt/otobo/var/tmp, which is not shared

A special case is the usage of shared memory for the AdminLog frontend module.

A note regarding Redis

Redis is not a safe data storage. Everything which has to persist has to be put into either the database or a storage like S3.

Bootstrapping

Some of the approaches below need some data to work. E.g. the database connection and the location of shared storage must be known beforehand. Maybe put the essential config into a config map and pass it to the containers via the environment.

Manual changes to Kernel/Config.pm

In the current use case it is fairly usual to make manual adaption to Kernel/Config.pm . In an environment where containers are started frequently and /opt/otobo is not share this becomes less manageable. Some control can be exerted via the use of environment variables. But this isn't very convenient either.

The currently recommended approach is to use locally built Images where Kernel/Config,pm is changed. But is there another. more convenient, option.? An alternative is to provide a customisation package that contains an Autoload module. If that is the new recommended approach, then it must be documented properly.

.pm files in Kernel/Config/Files*

There are only three files that need to be considered here:

ZZZAAuto.pm - cache of the system config
ZZZACL.pm - cache of the ACL setup
ZZZProcessManagement.pm - cache of the process setup

Usually there are no other *.pm files in Kernel/Config/Files. But an OTOBO package might be doing weird stuff. But in this case the service needs to be restarted anyways.

These Perl modules in Kernel/Config/Files constitute the major part of the current configuration of OTOBO. Whenever an instance of Kernel::Config is created the following actions are executed:

the list of files is determined and sorted
loop over the sorted files:
- the modules are loaded if they were not loaded before
- in a web context, the modules are reloaded if they have been touched, or size has changed, since they were last loaded
- the method Load() is called with passing the package name and the instance of Kernel::Config
  The daemon modules do not check wheter ZZZ*,pm files have changed. Instead they for a relead on these files in each iterations. Console commands are not expected to run a long time, they simple read the ZZZ*.pm files when starting up.

The method Load() is free to do whatever it wants with the instance of Kernel::Config. Usually values are merged into the nested data structure. In most cases this could also be done by merging YAML files, but the interface is wide open by design. It is fairly frequent that the *.pm files change. An example are the config adaption done during unit tests.

This approach works fairly well for multiple servers when Kernel/Config/Files is located on a shared file system. Keeping this proven approach is a goal here.

Things do become a bit more complicated when Kernel/Config/Files is no longer shared. In the most simple case there is a single instance of truth. This reference must be synchronised for each POD before the list of modules is determined. This synchronisation should be done fairly quickly when a new instance of Kernel::Config. It suffices to add the check in `Kernel::Config'. This is so because otobo.psgi creates a new config object for every request and the daemon creates a new config object for every task.

The requirements for the synchronisation are:

Concurrent synchronisation must be allowed
Delete files that are no longer in the reference
Add files that were added in the reference
Update files that have been changed in the reference

The reverse direction must be supported too. Any changes to the files in Kernel/Config/Files must initially be done in the reference. The Daemon and the webserver are responsible that the work with the newest version of the configuration.

It is not obvious how this can be implemented effectively. The most simple approach is to store the .pm files under a specific S3 prefix. Daemon and webserver then need to get the current versions from S3. This can be done with getting an object listing and comparing object size and modification time.

It is not obvious whether the above approach is the best solution. Some othere ideas are:

checksum test on the whole directory
lockfile for synchronising the synchronisation processes
periodic checks like in Kernel::System::Daemon::DaemonModules::SystemConfigurationSyncManager
Some kind of event system, subscriber queue (better be avoided)
Bucket notification when file changes. The event can be written to Redis, See https://docs.min.io/docs/minio-bucket-notification-guide.html

Note that there already is the Daemon module Kernel::System::Daemon::DaemonModules::SystemConfigurationSyncManager. The current understanding is that this module can't help with distributing the configs. The OTOBO Daemon does not, and should not, know about which PODs are running.

Loader Files, Minified and Concatenated CSS and JS files

The most safe solution is to simply turn off the loader. This can always be done when there are problems cropping up.

If only minification is enough, then something like https://metacpan.org/release/IDOPEREL/Plack-App-MCCS-1.000000 could be used.

Keeping the current approach to minification is doable too. Currently minified files are written into the file system. This occurs when the server generates HTML content. The browser then requests these pages when he renders the HTML pages. These generated files are often served by_otobo.psgi_, but they can also be served by any reverse proyx.

When running in K8s we have no shared file system. Instead we write the minified files into S3. The storage in S3 is the reference. When a loader files is requested then following actions take place:

no permission check, just like the other static files
check whether the files is locally available in the file system
- if yes then proceed
- if no then fetch from S3
serve the local file

Always serving from the file system has the advantage that the files can be streamed. Streaming is supported by https://metacpan.org/pod/Plack::App::File.

For further optimisation the minified files could also be served by the K8s load balancer Ingress directly from S3. Local caching is possible in this case too.

NodeID

The config setting NodeID is set to the value 1 per default. The intention was that it is set to different values for different nodes.
But currently there is no way to automatically assign NodeIDs to Daemon instances. Furthermore, it is not obvious whether it is really necessary to provide different NodeIDs. Let's list where NodeID is used in OTOBO.

The value `DaemonRunning` of the cache is per NodeID

It is set by the Daemon itself, in the file https://github.com/RotherOSS/otobo/blob/rel-10_1/Kernel/System/Daemon/DaemonModules/SchedulerTaskWorker.pm . It is used by the plugin DaemonRunning of the support data collector and by the notification plugin DaemonCheck. Another use is in the package manager frontend, https://github.com/RotherOSS/otobo/blob/rel-10_1/Kernel/Modules/AdminPackageManager.pm . This "Upgrade All packages" can only be done when the Daemon is running.

NodeID is also used in Kernel/Modules/AdminRegistration.pm. But this frontend module is no longer used.

Most of these usages might be broken as they check only the specific node of the web server. But things should be fine when the Daemon runs on any node.

Daemon module `SystemConfigurationSyncManager`

The NodeID seems to be only validated, but not used.

Daemon module `SyncWithS3`

The NodeID is currently not used in that plugin. But maybe it should, for synchronisation.

Task scheduler within the OTOBO Daemon

This is bit involved. It looks like the Daemon determines which scheduled tasks, or cron tasks, need to be executed and add these tasks to a task queue. The scheduling is specific for a combination of process id of the Daemon and the NodeID. But this assumes that the NodeID is different on different hosts. This means that the current approach, where NodeID is always 1, might be broken.

This concerns:

SchedulerTaskWorker
SchedulerGenericAgentTaskManager
SchedulerFutureTaskManager
SchedulerCronTaskManager

Also, the PID-file contains the NodeID. But that should be fine as the PID-file is usually in /opt/otobo/var/run

Kernel::System::SysConfig

The NodeID is part of a temporary setting in the database attribute sysconfig_deployment.comments . This is likely not critical.

Kernel::System::Ticket::NumberBase

The NodeID is part of ticket_number_counter.counter_uid. Not critical.,

Installing OTOBO packages

In a Kubernetes context, the cleanest way would be that a new or updated package would be delivered via new docker images. But the OTOBO way is to use the package definitions in the database. This approach also allows operation in non-Kubernetes settings.

My impression is that up to now cluster support for OTRS is usually done via a shared file system. There seem to be no readily available solutions for distributing changed files. This implies that we need to roll our own watchdogs. There is already is support for detecting changes in the ZZZ*.pm files. Unfortunately the web server and the daemon do these checks in different ways. It would be nice to unify these checks, as both use cases have basically the same requirements. In both cases it would be nice to have:

a script that manages the worker processes
the same script doing periodic checks for config changes
when config is changed, there can be a check whether a new package has been installed
that check could be implemented by checking package_repository.id that might have caused the config change
when a package action was detected, then halt or stop the workers, do the package upgrade, and restart the workers

There are several several options for how package updates are communicated to the watchdogs. One thing that can be tried is to abuse the S3 prefix OTOBO/Kernel/Config/Files and add or update a file like repostory_list.json. An update to that file would indicate an added, updated or deleted package.

TODO: what about removed packages

Using cron is not really a good option. Usually cron needs to run as root, which is not best Kubernetes practice. Alternatives to cron, like https://github.com/aptible/supercronic or https://github.com/mcuadros/ofelia, are not available as Debian packages.For every web server and daemon we probably need a watchdog that checks for OTOBO package updates. Adding the watchdoq to Gazelle and otobo.Daemion.pl is kind of hard. Let's go for something like cron. The watchdog can send SIGHUP to the watched daemon. Maybe better: halt the server, update files, resume the service. It looks like this can be done with otobo:Daemon.pl, not sure about the webserver.

When a POD is starting up, we can use initContainers. These containers do the update actions as defined bin/docker/entrypoint.sh .

Rolling restart of the PODs should be avoided.

$User.pm files

Regenerating user specific configurations can be done the same way as the ZZZ*.pm files. But I suggest to disallow $User.pm files in the initial Kubernetes support.

Articles and attachments

Articles and attachments can be persistently stored in S3.

New communication paths between PODs

In order to keep things simple, no new communication infrastructure should be introduced.

Logging

Investigate and decide how logging should be done. Maybe with Log::Log4perl to a logging server, or Kubernetes looks at the syslog.

Health checks

It might be that K8s does not rely on Docker health checks, but wants to probe itself. See https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/.

related issues:

UnitTests: consider scripts in Custom/scripts/test and allow complete pathes #512

Conclusion

Aim for strategy 1 without a persistent volume. Tackle the critical points one after the other.

Current implementation

Some support for S3 is implemented but testing is very basic and documentation is non-existent. Here is the beginnings of a HOWTO.

Development and testing is done with localstack. But for production MinIO is recommended. The steps are:

Gazelle must be used, as it checks for new or changed OTOBO packages
Start an instance of MinIO and create a bucket.
Adapt the S3 related config in Kernel/Config.pm
Activate OTOBO_SYNC_WITH_S3 in .env in the Docker case
Activate OTOBO_SYNC_WITH_S3 in the environment in the non-Docker case
Remove mounted volume /opt/otobo
Restart webservers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Kubernetes, ephemeral OTOBO containers #1148

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Support for Kubernetes, ephemeral OTOBO containers #1148

bschmalhofer Jul 26, 2021 Collaborator

Recap of Kubernetes Basics

Doing away with the shared and persistent file system

A note regarding Redis

Bootstrapping

Manual changes to Kernel/Config.pm

*.pm files in Kernel/Config/Files

Loader Files, Minified and Concatenated CSS and JS files

NodeID

The value DaemonRunning of the cache is per NodeID

Daemon module SystemConfigurationSyncManager

Daemon module SyncWithS3

Task scheduler within the OTOBO Daemon

Kernel::System::SysConfig

Kernel::System::Ticket::NumberBase

Installing OTOBO packages

$User.pm files

Articles and attachments

New communication paths between PODs

Logging

Health checks

related issues:

Related questions and TODOs

See also

Conclusion

Current implementation

Replies: 0 comments

bschmalhofer
Jul 26, 2021
Collaborator

.pm files in Kernel/Config/Files*

The value `DaemonRunning` of the cache is per NodeID

Daemon module `SystemConfigurationSyncManager`

Daemon module `SyncWithS3`