Releases: oracle/coherence-operator
v3.1.4
Coherence Operator Release 3.1.4
Changes:
-
Support environment variable expansion in JVM args, program args and classpath entries. Any values in the format
$VAR
or${VAR}
in any of those elements in the Coherence yaml will be replaced with corresponding environment variable values at runtime. These environment variables are taken from the Coherence container in the Coherence cluster Pod's not from the Operator's environment. -
Fixed an issue where a Coherence deployment could not be deleted if all of the Pods failed to start (for example if all Pods were stuck in an ImagePullBackoff loop)
-
Fixed an issue where updating the Coherence yaml to both scale down to 1 replica and cause a rolling update could result in data loss even if persistence is enabled. Obviously this scenario is guaranteed to cause complete data loss in cache services where persistence is not enabled but in clusters with active persistence an upgrade of a single member should not loose data.
-
Allow the operator to be installed without requiring ClusterRoles and ClusterRoleBindings. Whilst this is really not recommended some customers have queried whether it is possible due to particularly tight corporate security policies. It is now possible to run the Operator without installing cluster wide RBAC roles but with some caveats - see the RBAC section of the install section of the documentation.
v3.1.3
Coherence Operator Release 3.1.3
This release contains a single bug fix that removes some debug logging that was left in the readiness probe that the Operator injects into the Coherence containers. This does not cause functional issues but it is irritating to see a lot of these messages the Coherence logs.
v3.1.2
Coherence Operator Release 3.1.2
This is a minor bug fix release.
Changes
-
The Operator now looks at multiple Node labels to determine the site and rack values for a Coherence member.
Previously a single label was used, which may not work on older k8s versions that still used deprecated topology labels.
This would cause Coherence services to fail to reach site safe. -
Cleaned up confusion around the multiple different Grafana dashboards available.
-
Added Grafana Dashboards that support the Coherence Micrometer Prometheus metric name format.
These dashboards are for use with applications that use the Coherence Micrometer integration released with Coherence CE 20.12 -
Added a troubleshooting FAQ to the documentation. This is intended to be a growing guide to troubleshooting deploying Coherence clusters in k8s.
-
The default readiness probe no longer waits for
DefaultCacheServer
to start. This can optionally be re-enabled with a system property.
This feature was originally added for a customer where the first readiness probe was executing too early but is not required by most applications. It is simpler to adjust the readiness probe timings. Waiting forDefaultCacheServer
will not always work especially if
using the new bootstrap API released with Coherence CE 20.12. -
Cleaned up some documentation errors
v3.1.1
Coherence Operator Release 3.1.1
Today we have release 3.1.1 of the Coherence Operator. This contains a single bug fix on top of 3.1.0, albeit a very important one.
⚠️ Deprecation of 3.1.0 ⚠️
We are deprecating 3.1.0 which should not be used due to breaks in compatibility with previous 3.0.x versions.
Changes
An issue came to light soon after the release of v3.1.0 where the CRD name had changed slightly and subtly from coherence.cohrence.oracle.com
to coherences.coherence.oracle.com
which was enough to break transparent upgrades from previous 3.0.x versions to 3.1.0. Initially we thought that the work-around of manually deleting the previous 3.0.x CRD would be sufficient. It soon became clear that this was totally impractical as deleting a CRD causes all of the Coherence deployments creadted from the CRD to also be deleted, again breaking transparent upgrades.
For that reason we have changed the CRD name back to coherence.cohrence.oracle.com
in version 3.1.1, obviously making it incompatible with 3.1.0 but compatible with 3.0.x. We recommend customers completely skip 3.1.0 and upgrade to 3.1.1. If you have installed 3.1.0 then you must manually delete the coherences.cohrence.oracle.com
CRD before installing 3.1.1 which will again delete any clusters that are running.
Version 3.1.1 is backwards compatible and with 3.0.x and installation of 3.1.1 will not affect clusters already running from a previous 3.0.x release, just uninstall 3.0.x and install 3.1.1. We now have tests in the CI build to verify this and hopefully stop this sort of issue occurring in future.
v3.1.0
Coherence Operator v3.1.0
🚫 THIS RELEASE IS DEPRECATED - DO NOT USE 🚫
name of the CRD to change slightly from coherence.coherence.oracle.com
to coherences.coherence.oracle.com
(the first coherences
is now plural). The work-around for this was to delete the existing CRD but that would cause all Coherence clusters that
had been deployed with the previous CRD to also be deleted. This is obviously totally impractical.
This version of the Coherence Operator is compatible with previous 3.0.* versions, there should have been no breaking changes and Coherence yaml used with 3.0.* versions should work with 3.1.0.
Changes in Operator 3.1.0
Project Restructure
The biggest change from our perspective was the move to the final 1.0.0 release of the Operator SDK. Just before that release the Operator SDK team made big changes to their project, removing a lot of things and basically switching to using Kubebuilder for a lot of the code generation and configuration. The meant that we had to do a bit of reorganization of the code and project layout. The Operator SDK also removed its test framework, which we had made extensive use of in our suite of end-to-end integration tests. Some things became simpler with using Kubebuilder, but we still had to do work to refactor our tests. This is all of course transparent to Coherence Operator users, but was a sizeable piece of work for us.
Deployment
The change to using Kubebuilder, and using the features it provides, has meant that we have changed the default deployment options of the Coherence Operator. The recommended way to deploy the Coherence Operator with 3.1 is to deploy a single instance of the operator into a Kubernetes cluster and that instance monitors and manages Coherence resources in all namespaces. This is a change from previous versions where an instance of the operator was deployed into a namespace and only monitored that single namespace, meaning multiple instances of the operator could be deployed into a Kubernetes cluster.
There are various reasons why the new model is a better approach. The Coherence CRDs are deployed (or updated) by the Operator when it starts. In Kubernetes a CRD is a cluster scoped resource, so there can only be a single instance of any version of a CRD. We do not update the version of our CRD with every Operator release - we are currently at v1. This means that if two different versions of the Coherence Operator had been deployed into a Kubernetes cluster the version of the CRD deployed would only match one of the operators (typically the last one deployed) and this could lead to subtle bugs or issues due to version mis-matches. The second reason is due to version 3.1 of the operator introducing admission web-hooks (more on that below). Like CRDs, admission web-hooks are also really a cluster scoped resource so having multiple web-hooks deployed for a single CRD may cause issues.
It is possible to deploy the Coherence Operator with a list of namespaces to monitor instead of monitoring all namespaces, and hence it is possible to deploy multiple operators monitoring different namespaces, we just would not advise this.
Admission Web-Hooks
Version 3.1 of the operator introduced the use of admission web-hooks. In Kubernetes an admission web-hook can be used for mutating a resource (typically applying defaults) and for validating a resource. The Coherence Operator uses both of these, we apply default values to some fields, and we also validate fields when a Coherence resource is created or updated. In previous versions of the operator it was possible to see issues caused by creating a Coherence resource with invalid values in some fields, for example altering a persistent volume when updating, setting invalid NodePort values, etc. In previous versions these errors were not detected until after the Coherence resource had been accepted by Kubernetes and a StatefulSet or Service was created and subsequently rejected by Kubernetes causing errors in the operators reconcile loop. With a validation web-hook a Coherence resource with invalid values will not even make it into Kubernetes.
Kubernetes Autoscaler
Back in version 3.0 of the operator we supported the scale sub-resource which allowed scaling of a Coherence deployment using built in Kubernetes scale commands, such as kubectl scale. In version 3.1 we have taken this further with a full end-to-end example of integrating a Coherence cluster into the Kubernetes Horizontal Pod Autoscaler and showing how to scale a cluster based on metrics produced by Coherence. This allows a Coherence cluster to grow as its resource requirements increase, for example as heap use increases. This is by no means an excuse not to do any capacity planning for you applications, but does offer a useful way to use your Kubernetes resources on demand.
Graceful Cluster Shutdown
As a resilient data store Coherence handles Pods leaving the cluster by recovering the lost data from backup and re-balancing the cluster. This is all great and exactly what we need but not necessarily when we actually just want to stop the whole cluster at once. Pods will not all die together, and those left will be working hard to recover as other Pods leave the cluster. If a Coherence resource is deleted from Kubernetes (or if it is scaled down to a replica count of zero) the Coherence Operator will now suspend all storage enabled cache services in that deployment before Pods are stopped. This allows for a more controlled cluster shut-down and subsequent recovery when brought back up.
Spring Boot Image Support
Spring Boot is a popular framework that we have big plans for in upcoming Coherence CE releases. One feature of Spring Boot is the way it packages an application into a jar, and then how Spring Boot builds images from the application jar. This could lead to problems trying to deploy those types of application using the Coherence Operator. The simplest way to package a Spring Boot application into an image for use by the Coherence Operator is to use JIB. The JIB Gradle or Maven plugins will properly package a Spring Boot application into an image that just works out of the box with the Coherence Operator.
Spring Boot images built using the latest Spring Boot Gradle or Maven plugins use Cloud Native Buildpacks to produce and image. The structure of these images and how they are run is quite different to a simple Java application. There are pros and cons with this, but as a popular framework and tooling it is important the Coherence Operator can manage Coherence applications built and packaged this way. With version 3.1 of the operator these images can be managed with the addition of one or two extra fields in the Coherence resource yaml.
Finally, if you really wish to put your Spring Boot fat-jar into an image (and there are reasons why this is not recommended) then the Coherence resource has configuration options that will allow this to work too.
Tested on Kubernetes 1.19
With the recent release of Kubernetes 1.19 we have added this to our certification test suite. We now test the Coherence Operator on all Kubernetes versions from 1.12 to 1.19 inclusive.
v3.0.2
This is a minor point release of the Coherence Operator.
Fixes
- Fixed an issue where the Operator continually re-reconciled the StatefulSet for a Coherence deployment is persistence was enabled using PVCs.
Notes When Using Persistence or Configuring VolumeClaimTemplates
One of the limitations of a StatefulSet (which is used to control the Pods of a Coherence deployment) is that certain fields are effectively read-only once the StatefulSet has been created. One of these is the VolumeClaimTemplates array. This means that the Coherence Operator will not attempt to change a VolumeClaimTemplate for a StatefulSet once the StatefulSet has been created even if a change to a Coherence deployment yaml should have caused a change. For example enabling and then later disabling persistence will not cause the persistence VolumeClaimTemplate to be removed from the StatefulSet, and vice-versa, enabling persistence as an update to a running deployment will fail to add the VolumeClaimTemplate.
Images
Images can be pulled from Oracle Container Registry (credentials are not required to pull the Operator images).
docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.2
docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.2-utils
v3.0.1
This is a minor point release of the Coherence Operator.
Fixes
- Fixed an issue where the Operator continually re-reconciled the StatefulSet for a Coherence deployment is persistence was enabled using PVCs.
Changes
Note: As of this release the Docker images for the Operator are no longer published to Docker Hub.
Images can now be pulled from Oracle Container Registry (credentials are not required to pull the Operator images).
docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.1
docker pull container-registry.oracle.com/middleware/coherence-operator:3.0.1-utils
v3.0.0
Operator 3.0.0 is a significant new version of the Coherence Operator.
Docs: https://oracle.github.io/coherence-operator/docs/3.0.0
Changes
This release is very different to the previous 2.x release, with a new simpler CRD, and is not backwards compatible.
Version 3.0.0 and 2.x can co-exists in the same K8s cluster.
The concept of Clusters
and Roles
has gone and been replaced by a single Coherence
CRD. When a Coherence cluster is made up of multiple roles then each of these is now deployed and managed as a separate Coherence
resource in k8s.
The reason for a new major release was so that we could completely remove the internal use of the Operator SDK Helm controller and now controller and instead reconcile all of the k8s resources in our own controller. This give us a full control over what gets reconciled and how we perform updates and merges and makes maintaining backwards compatibility for future releases simpler.
There is a converter utility in the assets section of the release below that can convert v2 yaml to v3 yaml. The only caveat is that Operator 3.0.0 expects only a single image to be specified that contains both Coherence and any application code.
See the docs on creating applications https://oracle.github.io/coherence-operator/docs/3.0.0/#/applications/010_overview
The converter takes a single command line parameter, which is the name of the file to convert, and outputs the converted yaml to stdout.
For example:
converter my-cluster.yaml
v2.1.1
Changes
-
Allow the scheme to be specified for the fluentd logging configuration.
When deploying the Operator Helm chart with an Elasticsearch server that requireshttps
the scheme can now be specified. See https://oracle.github.io/coherence-operator/docs/2.1.1/#/logging/030_own -
As an alternative to using host/port/scheme when specifying the Elasticsearch endpoint that fluentd should be configured to use it is now possible to specify the full host URL (or a comma delimited list of URLs) see https://oracle.github.io/coherence-operator/docs/2.1.1/#/logging/030_own
NOTE
If upgrading from earlier releases of the Operator into a k8s namespace where there are already existing Coherence clusters that are configured with Fluentd enabled then due to limitations in the way that the Operator uses Helm internally this release will cause a rolling upgrade of those existing cluster. Existing Coherence clusters that do not have Fluentd enabled will not be affected
v2.1.0
NOTE
We are aware that this version of the Operator contains changes to the CRD that make it incompatible with previous versions. What this means is that if version 2.1.0 of the Operator is installed into a k8s namespace that already contains CoherenceCluster
instances deployed with a previous version this causes error messages in the Operator and the existing clusters can no longer be controlled by the Operator. The only solution is to remove and re-create the affected clusters.
New Features
-
Add the ability to specify role start-up dependencies for a cluster. Roles can be configured to start after other roles. https://oracle.github.io/coherence-operator/docs/2.1.0/#/clusters/035_role_startup_ordering
-
Allow roles to be configured to not be part of the cluster's WKA list. https://oracle.github.io/coherence-operator/docs/2.1.0/#/about/05_cluster_discovery