Task controller for multi-pod jobs in cass-operator #243

burmanm · 2021-12-07T14:59:48Z

What this PR does:
Runs jobs in serial in every pod part of the Datacenter, either using the async functions or sync functions (depending on the version of the cassandra container).

Which issue(s) this PR fixes:
Fixes #

Checklist

Changes manually tested
Automated Tests added/updated
Documentation added/updated
CHANGELOG.md updated (not required for documentation PRs)
CLA Signed: DataStax CLA

…gle pod runner. Fix a bug in the httphelper when fetching jobStatus

apis/control/v1alpha1/cassandratask_types.go

jsanda · 2021-12-09T03:21:11Z

controllers/control/cassandratask_controller.go

+//+kubebuilder:rbac:groups=control.k8ssandra.io,namespace=cass-operator,resources=cassandratasks/status,verbs=get;update;patch
+//+kubebuilder:rbac:groups=control.k8ssandra.io,namespace=cass-operator,resources=cassandratasks/finalizers,verbs=update
+
+// Do we need to repeat this? It's already on the cassandradatacenter_controller.go


Good question. My guess is no, but I think it is good to leave it because 1) it provides documentation and 2) it avoid an unnecessary coupling on cassandradatacenter_controller.go.

controllers/control/cassandratask_controller.go

apis/control/v1alpha1/cassandratask_types.go

controllers/control/cassandratask_controller.go

apis/control/v1alpha1/cassandratask_types.go

controllers/control/cassandratask_controller.go

jsanda · 2021-12-09T17:37:54Z

apis/control/v1alpha1/cassandratask_types.go

+	// It is represented in RFC3339 form and is in UTC.
+	// The completion time is only set when the job finishes successfully.
+	// +optional
+	CompletionTime *metav1.Time `json:"completionTime,omitempty"`


The completion time is only set when the job finishes successfully.

Is this correct? If I understand reconcileEveryPodTask correctly, it looks like the job should eventually reach a completion state even if it has failed.

No, that text is copied over from Job. I don't know yet how to approach this, should it be "CompletionTime + Failed count" to notice that job failed or how would one check that it will no longer be executed (there are cases where we won't retry).

…nd NodeMgmtClient initialization

burmanm · 2021-12-10T14:23:52Z

jsanda · 2021-12-10T17:29:04Z

controllers/control/cassandratask_controller_test.go

+)
+
+func createDatacenter(dcName, namespace string) func() {
+	return func() {


Why does this need to be wrapped in a function?

Because of the way ginkgo works Describe("...", func() ..) and that's how it was previously used (until I refactored a bit to do it in the beforesuite instead of inside the test)

jsanda · 2021-12-10T17:51:19Z

pkg/httphelper/server_test_utils.go

+	]
+	}`
+
+var jobDetailsCompleted = `{"submit_time":"1638545895255","end_time":"1638545895255","id":"%s","type":"Cleanup","status":"COMPLETED"}`


When I first saw this I wondered whether there should be a separate job details for the rebuild, but looking again at reconcileEveryPodTask I see that it only checks the status. Is type actually used anywhere?

Not by cass-operator. But if one would have "getAllJobs" in the management-api, you could create external tool to monitor the jobs from pods.

jsanda · 2021-12-10T17:56:27Z

controllers/control/cassandratask_controller_test.go

+	return taskKey, task
+}
+
+var _ = Describe("Execute jobs against all pods", func() {


Thanks for writing these tests. This is good stuff. There are probably more scenarios we could test, but they could make for good follow up PRs, particularly for someone else looking to get more familiar with the code.

There are more scenarios, but writing them as envtest is difficult. Something like "prevent this from running concurrently" / "allow concurrently".

jsanda · 2021-12-12T04:44:05Z

controllers/control/cassandratask_controller.go

+			pod.Annotations = make(map[string]string)
+		}
+
+		if podJobId, found := pod.Annotations[podJobIdAnnotation]; found {


I have been testing your branch with my k8ssandra-operator branch and hit some issues. I am testing adding a new DC to an existing cluster. I first tested a with a single-node DC. I grepped Cassandra's logs to confirm that the rebuild operation. Yay for that!

I then decided to try my test with 2 nodes. When the new DC, dc2, comes up in this case, cleanup runs first. When cleanup finishes, the job id and job status annotations are still present on the pods. The rebuild task is created and reported as finished but not rebuild operations are actually performed. This is because at line 354 we check for the job id annotation. It is already present from cleanup, so we assume that the job is done.

Once I realized what was going on, I thought I might have a quick fix by simply adding a statement to delete the job id annotation at line 384. Unfortunately it leads to additional problems.

This situation made it abundantly clear that there needs to be an integration for this, i.e., two tasks run in succession.

Removing both the job id and job status annotations when the job completes won't work either because on a subsequent reconciliation we won't be able to determine if the job has already run on that pod. I think that storing the completed pods in the CassandraTask status would make this easier.

I think I'll handle this the same way as Kubernetes handles affinity status in annotations (json-serialized job information).

If we update TaskConfig to include the task name, then we might be able to add a check to see if both the job id and job status are set but belong to a different task other than the current one, then we would know we can proceed with execution. I can try to test this out today.

Well, I did get it to run twice in the test at least with my approach. The current commit is a bit hacky, but I don't have the energy to make it pretty tonight - I'll continue tomorrow. But it seems to pass the test with two tasks in the same datacenter at least.

Thanks for pushing the commit. I tested and it didn't resolve the issue.

Hmm, I can't see them clashing at least on the annotations part:

Annotations: control.k8ssandra.io/job-4d7dc3a5-68c4-40cc-9136-cd9b3daac215: {"jobId":"448f3370-2aab-42de-8262-257911379814","jobStatus":"COMPLETED","jobHandler":"management-api"} control.k8ssandra.io/job-93af73b5-12aa-446e-9e12-720a85227e66: {"jobId":"964845be-a7dd-4c1c-977c-08988d8dbf1c","jobStatus":"COMPLETED","jobHandler":"management-api"}

I also see that the cleanup endpoint is called multiple times. Hmm..

INFO [nioEventLoopGroup-2-1] 2021-12-13 10:04:59,643 Cli.java:617 - address=/10.244.1.2:35656 url=/api/v0/metadata/versions/features status=200 OK INFO [nioEventLoopGroup-2-2] 2021-12-13 10:04:59,646 Cli.java:617 - address=/10.244.1.2:35658 url=/api/v1/ops/keyspace/cleanup status=202 Accepted INFO [nioEventLoopGroup-2-1] 2021-12-13 10:04:59,670 Cli.java:617 - address=/10.244.1.2:35660 url=/api/v0/metadata/versions/features status=200 OK INFO [nioEventLoopGroup-2-2] 2021-12-13 10:04:59,672 Cli.java:617 - address=/10.244.1.2:35662 url=/api/v0/ops/executor/job status=200 OK INFO [nioEventLoopGroup-2-1] 2021-12-13 10:04:59,708 Cli.java:617 - address=/10.244.1.2:35664 url=/api/v0/metadata/versions/features status=200 OK INFO [nioEventLoopGroup-2-2] 2021-12-13 10:05:03,608 Cli.java:617 - address=/10.244.4.1:37512 url=/api/v0/probes/readiness status=200 OK INFO [nioEventLoopGroup-2-1] 2021-12-13 10:05:09,670 Cli.java:617 - address=/10.244.1.2:35666 url=/api/v0/metadata/versions/features status=200 OK

This was with live pods, third task to execute.

jsanda · 2021-12-13T21:12:35Z

controllers/control/cassandratask_controller.go

+func (r *CassandraTaskReconciler) activeTasks(ctx context.Context, dc *cassapi.CassandraDatacenter) ([]api.CassandraTask, error) {
+	var taskList api.CassandraTaskList
+	matcher := client.MatchingLabels(utils.MergeMap(dc.GetDatacenterLabels(), map[string]string{taskStatusLabel: activeTaskLabelValue}))
+	if err := r.Client.List(ctx, &taskList, client.InNamespace(dc.Namespace), matcher); err != nil {


This always returns an empty result set because there are no datacenter labels on a CassandraTask. You need to remove the datacenter labels and then I would filter the results based on the .Spec.Datacenter field.

jsanda · 2021-12-13T21:25:26Z

controllers/control/cassandratask_controller.go

+					}
+					return ctrl.Result{RequeueAfter: 1 * time.Second}, nil
+				} else if details.Status == podJobError {
+					// Log the error, move on


When the job finishes, either successfully or fails, the job annotation never gets removed from the pod so you end up with an accumulation of job annotations, e.g.,

apiVersion: v1 kind: Pod metadata: annotations: control.k8ssandra.io/job-0f003348-33db-4272-bfcd-790ad27785cf: '{"id":"0fc51790-c522-4f74-9d87-a93a08d252af","status":"COMPLETED","handler":"management-api"}' control.k8ssandra.io/job-91f0112a-5cf6-4e65-b889-54d6676d2d8d: '{"id":"c38a54e5-5a57-4d2e-9067-7957370466cf","status":"COMPLETED","handler":"management-api"}' control.k8ssandra.io/job-b7946ffd-2456-4544-bab8-c6f314c6fcc4: '{"id":"8b04c4f2-632b-429d-b678-aeeefe5489a6","status":"COMPLETED","handler":"management-api"}' creationTimestamp: "2021-12-13T18:29:52Z" generateName: test-dc2-default-sts-

We should be removing those once it is safe to do so.

Since the annotation refers to the CassandraTask uid, do you think it might provide more clarity to change the prefix to task, e.g., control.k8ssandra.io/task-b7946ffd-2456-4544-bab8-c6f314c6fcc4?

I actually need to add those annotations (so that all resources have those annotations).

controllers/control/cassandratask_controller.go

jsanda · 2021-12-13T22:01:18Z

pkg/reconciliation/reconcile_racks.go

 				// We weren't resuming from a stopped state, so we must be growing the
-				// size of the rack
+				// size of the rack and this isn't the initialization stage


Currently (in master) the scaling up condition does get set to true initially before the datacenter is ready. Are there any scenarios in which the Ready condition could be false and scaling up would happen? I assume no but want to be sure. If it is possible would it be better/safer to handle this in the CheckClearActionConditions function. The function could be updated to only call cleanupAfterScaling after the Initialized condition is true.

No, there's no such thing. This is the only place where ScalingUp is set (and I don't think it should be set when the cluster is starting).

My main point is that ScalingUp is set before the datacenter becomes ready, or more precisely, before the Initialized condition is set. What is the rationale for changing that existing behavior?

Because creating a new cluster is not scaling up existing one. I don't feel like these two should share the same status - especially if we have a different process for both cases.

…e task from the targeted Datacenter

…cy issue with cached gets from api-server, add CassandraTask checks to scale_up e2e tests. Also, a small example CassandraTask

jsanda · 2021-12-19T04:51:24Z

apis/cassandra/v1beta1/cassandradatacenter_types.go

+
+	// TrackedTasks tracks the tasks for completion that were created by the cass-operator
+	// +optional
+	TrackedTasks []corev1.ObjectReference `json:"trackedTasks,omitempty"`


Since you have added the DC labels to the CassandraTask is this necessary any more? I see where it is used in reconcile_racks.go, but that could be replaced with a query to fetch the CassandraTask. There are a couple benefits. First, it eliminates and therefore simplifies some code. Secondly, it cuts down on requeues which is big. The status updates for a task will trigger a lot of unnecessary requeues not only for cass-operator but also for k8ssandra-operator.

Status updates should not trigger a reconcile. That's a bug in the k8ssandra-operator if that happens and means the watcher is missing version check.

TBH I forgot about GenerationChangedPredicate when I wrote my comment. Status updates won't trigger a reconcile for cass-operator but it will for k8ssandra-operator since the latter does not use a GenerationChangedPredicate for its CassandraDatacenter watch. This is by design though. k8ssandra-operator doesn't use GenerationChangedPredicate since it does not manage the CassandraDatacenter. We do want status updates to trigger reconciliation in k8ssandra-operator. Aside from requeues (or lack thereof) I would still be in favor removing TrackedTasks since the information is already available through the CassandraTask.

That information isn't actually available, since TrackedTasks only include tasks generated by the cass-operator, while fetching with labels would give back all the tasks generated by anyone. At that point, I would need to know which ones have been created before the operation I ran in the cass-operator and which later. I don't have that information at the moment.

jsanda · 2021-12-19T05:13:54Z

controllers/control/cassandratask_controller.go

+
+	// TODO Add conditions also
+	if err = r.Client.Status().Update(ctx, &cassTask); err != nil {
+		return ctrl.Result{}, err


At one point I tested with a bad management-api image which resulted in an error when taskConfig.AsyncFunc is called. The failure was due to the RPC endpoint not being properly exposed. The errors happens against the first pod, subsequent pods are never tried. At some point the job should be marked failed. k8ssandra-operator ends up in a requeue loop waiting for the job to complete, but it never does.

It seems my reply never got here, odd.. well, the process catches failed jobs and exits with them correctly and cass-operator should kill pods that are not healthy. However, you hit an edge case where management-api is just simply not working correctly but cass-operator is not killing it.

…unts to executed pod counts

burmanm requested a review from a team as a code owner December 7, 2021 14:59

burmanm and others added 12 commits December 8, 2021 15:02

Update operator-sdk to 1.15, kustomize to 4.4.1

9f48646

Initial commit, controller for CassandraTask as well as API definitions

a3335b2

Implement unified version of the cleanup()

e69543f

Fix compilation issues due to internal pkg

f97ba34

Add rebuild and decommission async calls

5d845db

Status updating

2f5101a

Add a simple envtest that checks the behavior of the task against sin…

6ba5f9b

…gle pod runner. Fix a bug in the httphelper when fetching jobStatus

Some more modifications and improved tests

fb2f388

Fix more tests

1439107

Add back concurrency filters, test with multiple pods

b08870c

Fix RBAC rights

e17620c

cleanupAfterScaling creates a task now

4a415f4

burmanm force-pushed the rebuild branch from a1d797d to 4a415f4 Compare December 8, 2021 18:45

Fix odd number of logging elements

a07580b

jsanda requested review from jsanda and removed request for a team December 9, 2021 02:16

jsanda reviewed Dec 9, 2021

View reviewed changes

Address some comments

334e3e8

jsanda reviewed Dec 9, 2021

View reviewed changes

Refactor context handling in ReconciliationContext, logger creation a…

3172132

…nd NodeMgmtClient initialization

Increase timeout to see if additional seeds starts working again

30be6cb

jsanda reviewed Dec 10, 2021

View reviewed changes

Increase test timeout for eventually..

7527cfa

jsanda reviewed Dec 12, 2021

View reviewed changes

burmanm added 4 commits December 12, 2021 21:07

Use JSON serialized format for jobStatus set by per-taskId

aad494f

Implement TTL deletion, refactor tests for easier readability

f0675a1

More test cleanup

9451220

Do not run cleanup when creating a new cluster

e9306ae

jsanda requested changes Dec 13, 2021

View reviewed changes

jsanda reviewed Dec 13, 2021

View reviewed changes

burmanm added 5 commits December 14, 2021 16:43

Add better output for kubectl get cassandratasks and add labels to th…

b7b97f8

…e task from the targeted Datacenter

GetLabels, not GetAnnotations

fad66b0

Clean annotations from the pods after the task has been completed

c516448

Move back to more strict Update instead of Patch to prevent concurren…

34fa1cc

…cy issue with cached gets from api-server, add CassandraTask checks to scale_up e2e tests. Also, a small example CassandraTask

Minor status / activelabel change

fda1d9e

jsanda requested changes Dec 19, 2021

View reviewed changes

burmanm added 5 commits December 20, 2021 12:19

Typed command, taskConfiguration has more details

f9d3ca4

Add RestartPolicyOnFailure processing, change completed and failed co…

8ab6dd2

…unts to executed pod counts

Modify CassandraDatacenterReconciler

b118375

Replace SetOwnerReference with SetControllerReference

69e2ba4

Fix printcolumn for Job

9147e9a

jsanda approved these changes Jan 7, 2022

View reviewed changes

burmanm merged commit eb7bbb9 into k8ssandra:master Jan 8, 2022

jsanda mentioned this pull request Jan 12, 2022

K8SSAND-824 ⁃ K8ssandraCluster status should be aggregated and summarized by the operator k8ssandra/k8ssandra-operator#88

Closed

burmanm mentioned this pull request Jan 17, 2022

Task manager missing non-critical features #260

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task controller for multi-pod jobs in cass-operator #243

Task controller for multi-pod jobs in cass-operator #243

burmanm commented Dec 7, 2021

jsanda Dec 9, 2021

jsanda Dec 9, 2021

burmanm Dec 10, 2021

burmanm commented Dec 10, 2021 •

edited

Loading

jsanda Dec 10, 2021

burmanm Dec 10, 2021

jsanda Dec 10, 2021

burmanm Dec 10, 2021

jsanda Dec 10, 2021

burmanm Dec 10, 2021

jsanda Dec 12, 2021

burmanm Dec 12, 2021

jsanda Dec 12, 2021

burmanm Dec 12, 2021

jsanda Dec 13, 2021

burmanm Dec 13, 2021

burmanm Dec 13, 2021 •

edited

Loading

jsanda Dec 13, 2021

jsanda Dec 13, 2021

burmanm Dec 14, 2021

jsanda Dec 13, 2021 •

edited

Loading

burmanm Dec 14, 2021

jsanda Dec 14, 2021

burmanm Dec 15, 2021

jsanda Dec 19, 2021

burmanm Dec 19, 2021

jsanda Dec 19, 2021

burmanm Dec 20, 2021

jsanda Dec 19, 2021

burmanm Dec 20, 2021

Task controller for multi-pod jobs in cass-operator #243

Task controller for multi-pod jobs in cass-operator #243

Conversation

burmanm commented Dec 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burmanm commented Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burmanm Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsanda Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burmanm commented Dec 10, 2021 •

edited

Loading

burmanm Dec 13, 2021 •

edited

Loading

jsanda Dec 13, 2021 •

edited

Loading