Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConcensusStore does not maintain status in relation to managed StatefulSet #1

Open
ghost opened this issue Jan 24, 2023 · 1 comment

Comments

@ghost
Copy link

ghost commented Jan 24, 2023

My initial comment needs improvement, I'll post an update with better details shortly.

@ghost ghost changed the title volumeClaimTemplate should be immutable ConcensusStore does not maintain status in relation to managed StatefulSet Jan 24, 2023
@ghost
Copy link
Author

ghost commented Jan 24, 2023

Overview

Upon creating a new ConsensusStore resource, it appears that no information about the StatefulSet is used to inform on the ConsensusStore's status.

If we create this ConsensusStore as an example:

cat <<EOF | kubectl apply -f -
apiVersion: consensus.atomix.io/v1beta1
kind: ConsensusStore
metadata:
  name: my-consensus-store
spec:
  replicas: 3
  groups: 30
  volumeClaimTemplate:
    spec:
      accessModes:
      - ReadWriteOnce
      storageClass: "standard"
      resources:
        requests:
          storage: 2Gi
EOF

The beginning of its life looks something like this.

% kubectl get ConsensusStores,StatefulSets
NAME                                                    STATUS
consensusstore.consensus.atomix.io/my-consensus-store   

NAME                                  READY   AGE
statefulset.apps/my-consensus-store   0/3     1s


% kubectl get ConsensusStores,StatefulSets        
NAME                                                    STATUS
consensusstore.consensus.atomix.io/my-consensus-store   

NAME                                  READY   AGE
statefulset.apps/my-consensus-store   2/3     15s


% kubectl get ConsensusStores,StatefulSets
NAME                                                    STATUS
consensusstore.consensus.atomix.io/my-consensus-store   NotReady

NAME                                  READY   AGE
statefulset.apps/my-consensus-store   3/3     21s


% kubectl get ConsensusStores,StatefulSets
NAME                                                    STATUS
consensusstore.consensus.atomix.io/my-consensus-store   Ready

NAME                                  READY   AGE
statefulset.apps/my-consensus-store   3/3     2m58s

When I submitted atomix/atomix.github.io#26 I saw what appears to be unhandled conditions during startup as well:

2023-01-24T14:47:17.815Z        INFO    github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/cluster.go:440  Reconcile raft protocol service
2023-01-24T14:47:17.815Z        INFO    github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/cluster.go:485  Reconcile raft protocol headless
 service
2023-01-24T14:47:17.815Z        ERROR   github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1 v1beta1/cluster.go:184  Pod "my-consensus-store-0" not f
oundReconcile MultiRaftCluster
github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1.(*MultiRaftClusterReconciler).Reconcile
        github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1/cluster.go:184
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:121
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:320
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:273
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        sigs.k8s.io/controller-runtime@v0.12.1/pkg/internal/controller/controller.go:234

Also the controller does not seem to update it's status when it can't apply changes to the StatefulSet like this:

pilot@cove reflow % kubectl get ConsensusStores,StatefulSets
NAME                                                    STATUS
consensusstore.consensus.atomix.io/my-consensus-store   Ready

NAME                                  READY   AGE
statefulset.apps/my-consensus-store   3/3     13m


% cat <<EOF | kubectl apply -f -
apiVersion: consensus.atomix.io/v1beta1
kind: ConsensusStore
metadata:
  name: my-consensus-store
spec:
  replicas: 3
  groups: 30
  volumeClaimTemplate:
    spec:
      accessModes:
      - ReadWriteOnce
      storageClass: "standard"
      resources:
        requests:
          storage: 2Gi
EOF
consensusstore.consensus.atomix.io/my-consensus-store configured

% kubectl get ConsensusStores
NAME                 STATUS
my-consensus-store   Ready

The controller actually can't update the field on the StatefulSet, but I don't see any errors in the controller logs from it attempting to apply an update. Only this is produced:

2023-01-24T17:45:50.169Z	INFO	github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1	v1beta1/store.go:88	Reconcile ConsensusStore
2023-01-24T17:45:50.169Z	INFO	github.com/atomix/consensus-storage/controller/pkg/controller/consensus/v1beta1	v1beta1/store.go:99	Reconcile raft protocol stateful set

What happens when you try to make changes to the volumeClaimTemplate with kubectl manually would be this error:

The StatefulSet "web" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

Possible Improvements

  1. From the following output, the StatefulSet is not ready because desired_replicas != available_replicas. The ConsensusStore could possibly be defined as at least Pending.
% kubectl get ConsensusStore,StatefulSet
NAME                                                    STATUS
consensusstore.consensus.atomix.io/my-consensus-store   

NAME                                  READY   AGE
statefulset.apps/my-consensus-store   0/3     83m
  1. Produce information somewhere when failing to update the StatefulSet.
    Updating the status, creating an event, and having an error log entry for this is probably enough to help users out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants