Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Zabbix 6 related features / improvements of the chart #54

Merged
merged 40 commits into from
Jun 26, 2022
Merged

Zabbix 6 related features / improvements of the chart #54

merged 40 commits into from
Jun 26, 2022

Conversation

sa-ChristianAnton
Copy link
Contributor

@sa-ChristianAnton sa-ChristianAnton commented May 26, 2022

This pull request adds quite some fundamental improvements and changes to the chart, most related to Zabbix 6 features and best practices.

  • Zabbix server now always has "HA Cluster" mode set to "on", making it possible to easily raise replicaCount and have multiple redundant server containers
  • support DB schema initialization with multiple zabbix server instances started: in order not to have eventual multiple instances of the "Zabbix server" container image trying to initialize an empty database with a new database schema, a mechanism has been implemented that comes into action when replicaCount for zabbixserver is higher than 1: every Zabbix server pod will be started with an init container letting the zabbix server container start only when a fully functional DB schema has been detected by the init container, and additionally a one-time job is being started with one replica actually taking care of exactly this db initialization and then ending
  • added support for the Zabbix Webservice component (for creation of scheduled reports)
  • added a cronjob to autoclean orphaned HA nodes (pods being restarted by K8S...) from the HA nodes view
  • easy-to-use Pod anti affinity to make instances of Zabbix Server, Zabbix web frontend run on different cluster nodes if possible
  • converted zabbix server from statefulset to deployment (it's actually more of a stateless application rather than what a StatefulSet is designed for)
  • switched from Apache container to Nginx container for web frontend by default
  • added ZBX_SERVER_NAME as an explicit value, not to have to remember to add the according env variable manually as an extraEnv
  • added SAML related settings (a secret for the certificates, and appropriate example for env vars needed)
  • made the tag for all Zabbix components images centrally configurable. This way, it's much easier to switch from, say, Ubuntu to alpine or to raise the minor version for all images coming from Zabbix.
  • Added support for alternate "ingressroute" ingress CRD object for the use of the traefik reverse proxy
  • added possibility to set a fixed value for in case Zabbix web frontend should be exposed using NodePort
  • moved the liveness and readiness probe to the zabbixweb subsection of the values

This pull request implements #38.

Please let me know, in case this PR is being accepted, which version we should raise the chart to.

I would also like to suggest to implement the proposed cleanup of the user/password settings as described in my other PR:

  • no credentials within the subsections anymore
  • possibility to use either the "all-inclusive" db credentials secret, or settings within the values, or settings within the values combined with ONE secret just and only containing the password for DB access
  • in case postgresql.enabled=true, this should then be respected

I could do this quite quickly, but didn't want to do so yet without your agreeing, because it would be a "breaking change".

Additionally, I would also like to remove the zabbixproxy part of the chart, as this is already now covered by the official chart provided by Zabbix. What do you think?

Regards

Christian

* Zabbix server now always has "HA Cluster" mode set to "on", making it
  possible to easily raise replicaCount and have multiple redundant
  server containers
* added cronjob to autoclean orphaned HA nodes (pods being restarted by
  K8S...)
* easy-to-use Pod anti affinity to make instances of Zabbix Server,
  Zabbix web frontend run on different cluster nodes if possible
* converted zabbix server from statefulset to deployment
* switched from Apache container to Nginx container for web frontend by
  default
* added ZBX_SERVER_NAME as an explicit value, not to have to remember to
  add the according env variable manually as a extraEnv
* Deployment with configurable replicaCount
* automatic configuration of Zabbix Server accordingly
* pod anti affinity to spread multiple instances of cluster nodes if
  possible and replicaCount >1
When starting more than one Zabbix server containers with valid DB
connection credentials at the same time on an empty database, all of
them would try to connect and install the Zabbix DB schema to it at the
same time, leading to errors and invalid queries.

This commit implements an init container on the Zabbix Server pods that
not only waits for the DB to be accessible IP-wise, but also for the
schema to be available. This practically deactivates the schema
auto-creation for the Zabbix Server pods.

At the same time, a one-time Job has been added, which uses the Zabbix
Server image with a slightly modified docker-entrypoint.sh in order to
only wait for DB connection and then to install Zabbix DB schema, but
not to prepare config nor start Zabbix server.

With the combination of these two, we can now start off with a multinode
Zabbix HA cluster in Kubernetes.
if the step to create the db schema takes too long, the wait-job that
should prevent the Zabbix server container to start unless the database
is "really" ready, is leaving off too early and Zabbix server crashes at
start. This commit prevents this error by adding an additional step to
only start the Zabbix server container if there are more than 0 users in
the users table.
- certificate secret
- frontend settings example (via env)
We shouldn avoid exposing any part of the application by accident/default
@aeciopires
Copy link
Collaborator

  1. What an amazing job you did with the chart and solving the problem of living with multiple Zabbix Server pods using the same database. Thank you very much for your contribution.

  2. This is a break change. I think in 3.0.0 version. Do you agree?

  3. About zabbix-proxy, I exposed my ideas in this issue.

If we remove zabbix-proxy from here (even though it is optional today), it means that the official chart will be used as a dependency, correct?

The Zabbix SIA team never contacted us to join forces and create a unique chart. For this reason, I am afraid of the official chart evolving technically to the point of being incompatible using both charts simultaneously for technical reasons, each developer think different technical solutions. That makes sense? What do you propose to avoid this kind of problem when using our chart?

  1. Do you can run helm-docs to update documentation?

  2. Do you can change content of values.yaml file using zabbix-server with HA?

@sa-ChristianAnton
Copy link
Contributor Author

Thanks @aeciopires for taking into account my work. I am happy being able to contribute to this project.

I agree, version 3..0.0 sounds great, I will prepare that. Also, updating docs etc. is still open. Will do that within the next days.

Also, I will streamline the secrets/auth part as proposed.

I have read the other issue and comment by you there regarding Zabbix Proxy. Also, last time being at Zabbix HQ, I have talked to Alexei regarding this Helm chart, and he handed over the link to the integrations team, with the comment of that this might take some while.

Meanwhile, I would suggest we keep zabbix proxy component in this Chart but change it to be "disabled" by default. I believe, this chart should "just deploy a minimal running Zabbix server instance" ootb without the need to change any value at all, and deploying a Proxy by default would kind of go against this idea. Nevertheless, I think it's a good idea to keep the zabbix proxy and continue maintaining it.

Summarizing, I will do:

  • harmonize/unify the DB authentication part
  • Change zabbixproxy.enabled to false in values.yaml
  • Go through docs, examples and update accordingly
  • Raise version to 3.0.0 and run helm-docs

@aeciopires
Copy link
Collaborator

aeciopires commented May 29, 2022

Great @sa-ChristianAnton!

I would like to ask you for add your name and contact in Chart.yaml file as maintener.

Welcome to the team! :-)

Have a nice week.

From now on, all settings related to database access credentials are
being managed centrally for the entire chart:

- either by using a "unified_db_secret" containing ALL aspects of the db
  access configuration, conforming to the format being used by the PGO
  Postgres operator,
- explicit settings in the "db_access" section of the values.yaml. In
  case these are used, it is possible to use a secret just and only for
  the DB password, which will then be used by all components that apply.

This commit removes all postgres related username/password and DB name
settings on the levels of each of the components of the chart, and only
keeps the central ones.
instead, there is now an example placed in the values.yaml that shows
how to configure the ZBX_SERVER_NAME for the frontend by using
additional env variables.
@sa-ChristianAnton
Copy link
Contributor Author

sa-ChristianAnton commented May 30, 2022

I think this should be all.

One thing open I would like have added, but didn't yet: a "Route" object for Openshift. Probably going to add this within the next days, but if I don't manage to do so before you merge, I will add it later as a new PR.

Thanks for giving me a place in the maintainers list.

Cheers

@aeciopires
Copy link
Collaborator

Hi @sa-ChristianAnton!

Feel free to make any desired changes and necessary testing.

As this release will be a break change, there's no need to rush.

I would like to lose a big favor:

  1. In this other Open Source project, VictoriaMetrics (which I also help maintain)... a Makefile file was implemented to help with the automation of helm lint and helm-docs commands.

Could you add in this PR, reference and document the usage in the CONTRIBUTING.md file?

Source: https://github.com/VictoriaMetrics/helm-charts/blob/master/Makefile

PS.: I haven't used OpenShift yet. I won't be able to test.

Let me know when you think it's good for code review and merge.

templates/configmap-zabbix-server-init-waitschema.yaml Outdated Show resolved Hide resolved
templates/deployment-zabbix-server.yaml Outdated Show resolved Hide resolved
templates/cronjob-hanodes-autoclean.yaml Outdated Show resolved Hide resolved
templates/deployment-zabbix-web.yaml Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
values.yaml Outdated Show resolved Hide resolved
values.yaml Outdated Show resolved Hide resolved
values.yaml Outdated Show resolved Hide resolved
values.yaml Outdated Show resolved Hide resolved
values.yaml Outdated Show resolved Hide resolved
@sa-ChristianAnton
Copy link
Contributor Author

Actually, regarding the automatic setting of postgresql host and port when postgresql.enabled=true: I agree this adds quite some extra lines because it is being used at many places. Do you think it would make sense to put this entire part inside _helpers.tpl and refer to just this from all these places?

@sa-ChristianAnton
Copy link
Contributor Author

I did commit b386878 to implement one central template in _helpers.tpl that implements all the repeating logic for setting the DB access related ENV variables, with a switch of setting the right variable names, depending on whether it's a postgres container or a Zabbix component to set those at. The implementation is not yet 100% tested but works at a first pre-flight. I think this is a good way to get a lot of complexity out of the chart.

but still, special env var names are needed for the "cleanup" cron job
@sa-ChristianAnton
Copy link
Contributor Author

I did commit b386878 to implement one central template in _helpers.tpl that implements all the repeating logic for setting the DB access related ENV variables, with a switch of setting the right variable names, depending on whether it's a postgres container or a Zabbix component to set those at. The implementation is not yet 100% tested but works at a first pre-flight. I think this is a good way to get a lot of complexity out of the chart.

Tested all scenarios successfully now:

  • standalone installation setting all db parameters as vars, and using a secret for just the password
  • pgo based db cluster, using unified secret for all db related settings
    All work fine.

@aeciopires
Copy link
Collaborator

Actually, regarding the automatic setting of postgresql host and port when postgresql.enabled=true: I agree this adds quite some extra lines because it is being used at many places. Do you think it would make sense to put this entire part inside _helpers.tpl and refer to just this from all these places?

From a technical point of view, single in the _helpers.tpl file and calling the others would be great to improve code reuse, but it would make the code less intuitive and more difficult for other people to maintain, especially if they are new to using helm. I think we can keep it the way it is. Do you agree?

My comment had been more in the sense of using db_secrets instead of keeping lines referring to environment variables in the code. But I understand your idea of ​​making it easier for the user.

@aeciopires
Copy link
Collaborator

I did commit b386878 to implement one central template in _helpers.tpl that implements all the repeating logic for setting the DB access related ENV variables, with a switch of setting the right variable names, depending on whether it's a postgres container or a Zabbix component to set those at. The implementation is not yet 100% tested but works at a first pre-flight. I think this is a good way to get a lot of complexity out of the chart.

Great idea!

We can test it out and see how it goes. I think the way you implemented it is very good. I also think it's worth adding a comment about this in README.md or CONTRIBUTING.md. What do you think?

@sa-ChristianAnton
Copy link
Contributor Author

I did commit b386878 to implement one central template in _helpers.tpl that implements all the repeating logic for setting the DB access related ENV variables, with a switch of setting the right variable names, depending on whether it's a postgres container or a Zabbix component to set those at. The implementation is not yet 100% tested but works at a first pre-flight. I think this is a good way to get a lot of complexity out of the chart.

Great idea!

We can test it out and see how it goes. I think the way you implemented it is very good. I also think it's worth adding a comment about this in README.md or CONTRIBUTING.md. What do you think?

Absolutely. I will figure out some nice text for the README.

with this commit it is now possible to add additional (sidecar)
containers and volumes to each of the components if necessary.

For example, this might be needed if an additional utility would have to
run alongside Zabbix Server to pick up "real time export" data and send
them to a message bus or a logging sidecar container that needs to catch
the logs from the main container into an emptyDir volume and ship them
to a central logging system
@aeciopires
Copy link
Collaborator

Hi @sa-ChristianAnton!

Thank you for the amazing job you did on this PR. I read all recent changes. I think we're very close to merging.

I made one more simple comment and I will start the tests in the local environment.

with this change, it is now possible to supply arbitrary runtime
parameters to the postgresql database.

The "max_connections" setting that was formerly a setting of its own
inside values.yaml has been moved to use this new structure
values.yaml Outdated Show resolved Hide resolved
index.yaml Outdated Show resolved Hide resolved
templates/secret-db-access.yaml Show resolved Hide resolved
@aeciopires
Copy link
Collaborator

This PR solved issues #60, #58 and #38

@aeciopires aeciopires merged commit 88034d5 into cetic:master Jun 26, 2022
@sa-ChristianAnton sa-ChristianAnton deleted the zabbix-6-features branch July 4, 2022 16:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants