Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

Kafka Source still shown as ready when topic is deleted #760

Open
steven0711dong opened this issue Jul 13, 2021 · 9 comments
Open

Kafka Source still shown as ready when topic is deleted #760

steven0711dong opened this issue Jul 13, 2021 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)

Comments

@steven0711dong
Copy link
Contributor

Describe the bug
When user deletes the topic, Kafkasource status CR shows error but overall status is still shown as ready

Expected behavior
We should have the overall ready status show the precise error

To Reproduce
Create a Kafka source and make sure it is ready and receiving events and then delete the topic

Knative release version

Additional context
Add any other context about the problem here such as proposed priority

@steven0711dong steven0711dong added the kind/bug Categorizes issue or PR as related to a bug. label Jul 13, 2021
@lionelvillard
Copy link
Contributor

@devguyio @matzew @travis-minke-sap how do you handle this scenario in the channel implementations? I wonder if there is something we could leverage.

@travis-minke-sap
Copy link
Contributor

Yeah, interesting...

We (distributed channel) generally haven't done anything special to detect / correct such external manual deletion of Kafka Topics. We have just assumed that the KafkaChannel CRD is the "owner" of the Topic, and that users are expected not to mess with them out-of-band.

Without trying it out... I would assume that the Receiver / Dispatcher would start logging errors and that if the controller restarted it would recreate the Topic. Not sure of the Status in the interim but it probably isn't handled accurately.

@ntx-ben
Copy link

ntx-ben commented Jul 16, 2021

I've also noticed that deleting a KafkaChannel results in the eventing-kafka-channel-controller crashing (using v0.24.1):

{"level":"info","ts":"2021-07-16T21:35:28.537Z","logger":"eventing-kafka-channel-controller","caller":"kafkachannel/dispatcher.go:110","msg":"Successfully Finalized Dispatcher Deployment","knative.dev/pod":"eventing-kafka-channel-controller-84d8fd46d7-spxf4","knative.dev/controller":"knative.dev.eventing-kafka.pkg.channel.distributed.controller.kafkachannel.Reconciler","knative.dev/kind":"messaging.knative.dev.KafkaChannel","knative.dev/traceid":"fac8ea5e-4d12-4877-86b7-09df575bcc75","knative.dev/key":"default/my-kafka-channel","Channel":"default/my-kafka-channel"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1a6fcb3]
goroutine 148 [running[]:
knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel.(*Reconciler).deleteTopic(0xc000639400, 0x22b1d98, 0xc000cf7020, 0xc0002681b0, 0x18, 0xc0002681b0, 0x18)
    knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel/topic.go:137 +0x93
knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel.(*Reconciler).finalizeKafkaTopic(0xc000639400, 0x22b1d98, 0xc000cf7020, 0xc0001bb040, 0x0, 0x0)
    knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel/topic.go:82 +0x31e
knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel.(*Reconciler).FinalizeKind(0xc000639400, 0x22b1d98, 0xc000cf7020, 0xc0001bb040, 0x0, 0x0)
    knative.dev/eventing-kafka/pkg/channel/distributed/controller/kafkachannel/reconciler.go:177 +0x645
knative.dev/eventing-kafka/pkg/client/injection/reconciler/messaging/v1beta1/kafkachannel.(*reconcilerImpl).Reconcile(0xc000639540, 0x22b1d98, 0xc000cf6ed0, 0xc000268108, 0x18, 0xc00057e2f8, 0x22b1d98)
    knative.dev/eventing-kafka/pkg/client/injection/reconciler/messaging/v1beta1/kafkachannel/reconciler.go:259 +0x1011
knative.dev/pkg/controller.(*Impl).processNextWorkItem(0xc00073a600, 0xc00059f700)
    knative.dev/pkg@v0.0.0-20210622173328-dd0db4b05c80/controller/controller.go:531 +0x5e4
knative.dev/pkg/controller.(*Impl).RunContext.func3(0xc000262020, 0xc00073a600)
    knative.dev/pkg@v0.0.0-20210622173328-dd0db4b05c80/controller/controller.go:468 +0x53
created by knative.dev/pkg/controller.(*Impl).RunContext
    knative.dev/pkg@v0.0.0-20210622173328-dd0db4b05c80/controller/controller.go:466 +0x1a5

@travis-minke-sap
Copy link
Contributor

I've also noticed that deleting a KafkaChannel results in the eventing-kafka-channel-controller crashing (using v0.24.1):

I made a very quick attempt at reproducing this using Strimzi but didn't see the same behavior. If I understand correctly the error is caused by deleting a KafkaChannel whose backing Kafka Topic has already been deleted? Generally the logic should handle this case as a no-op (meaning... the topic should be deleted and it doesn't exist so there's nothing to do).

The actual failure above is most likely that the Reconciler's Kafka AdminClient is nil when it's trying to delete the Topic. The Reconciler re-creates the the AdminClient on every reconciliation loop (no re-use due to Sarama client timeout issues). Earlier in the logs do you see an error with "Failed To Create Kafka AdminClient" ? If so can you provide that info?

If you can reproduce the panic, we're definitely interested in understanding and fixing it - probably should create a separate Issue detailing the reproduction steps, etc... thanks!

@ntzlqx
Copy link

ntzlqx commented Jul 19, 2021

We encountered the same...I am able to replicate if I nuke the entire kafka cluster...the Admin Client then fails with the above

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2021
@lionelvillard
Copy link
Contributor

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2021
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 17, 2022
@lionelvillard
Copy link
Contributor

/remove-lifecycle stale
/triage accepted

@knative-prow-robot knative-prow-robot added triage/accepted Issues which should be fixed (post-triage) and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

No branches or pull requests

6 participants