Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deletion of subscriptions occurs orphaned subscriber in channel implementaiton's spec #6636

Open
seongjae-min opened this issue Dec 8, 2022 · 8 comments · Fixed by #6670
Labels
area/channels kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)

Comments

@seongjae-min
Copy link
Contributor

seongjae-min commented Dec 8, 2022

Describe the bug

  1. Create a channel and some subscriptions that subscribe data from the same channel.
  2. Delete the subscriptions
  3. The subscriptions are deleted, but the subscription information in the physical channel manifest randomly remains.

func (r *Reconciler) patchSubscription(ctx context.Context, namespace string, channel *eventingduckv1.Channelable, sub *v1.Subscription) (bool, error) {
after := channel.DeepCopy()
if sub.DeletionTimestamp.IsZero() {
r.updateChannelAddSubscription(after, sub)
} else {
r.updateChannelRemoveSubscription(after, sub)
}
patch, err := duck.CreateMergePatch(channel, after)
if err != nil {
return false, err
}
// If there is nothing to patch, we are good, just return.
// Empty patch is {}, hence we check for that.
if len(patch) <= 2 {
return false, nil
}
resourceClient, err := eventingduck.ResourceInterface(r.dynamicClientSet, namespace, channel.GroupVersionKind())
if err != nil {
logging.FromContext(ctx).Warnw("Failed to create dynamic resource client", zap.Error(err))
return false, err
}
patched, err := resourceClient.Patch(ctx, channel.GetName(), types.MergePatchType, patch, metav1.PatchOptions{})
if err != nil {
logging.FromContext(ctx).Warnw("Failed to patch the Channel", zap.Error(err), zap.Any("patch", patch))
return false, err
}
logging.FromContext(ctx).Debugw("Patched resource", zap.Any("patch", patch), zap.Any("patched", patched))
return true, nil
}

It seems like:

  1. Deletion of several subscriptions makes eventing-controller handle finalizers for the subscriptions.
  2. eventing-controller handle finalizers simultaneously.
  3. patchSubscription use PATCH API, so it cannot ensure the resource version.

Expected behavior
If a subscription is deleted, an element in subscriber in physical channel's spec will be deleted if the uid is the same.

To Reproduce

https://gist.github.com/WoWsj/3630deaa315fbc70043449231a4eaa1d

Knative release version

v1.8.2

Additional context

None

Questions

Is there any specific reason to use Patch, not Update?

I think Using Update with retryOnConflict is more proper way to ensure the resource versions in this case...

@seongjae-min seongjae-min added the kind/bug Categorizes issue or PR as related to a bug. label Dec 8, 2022
@pierDipi
Copy link
Member

Hi @WoWsj, thanks for reporting, I think, we should move to use Update to avoid this problem.

Do you have capacity for a contribution?

/triage accepted

@knative-prow knative-prow bot added the triage/accepted Issues which should be fixed (post-triage) label Dec 16, 2022
@pierDipi
Copy link
Member

/area channel

@knative-prow
Copy link

knative-prow bot commented Dec 16, 2022

@pierDipi: The label(s) area/channel cannot be applied, because the repository doesn't have them.

In response to this:

/area channel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@seongjae-min
Copy link
Contributor Author

@pierDipi I see. I will make a PR to change patch logic to Update. 😄

@l-qing
Copy link

l-qing commented Jan 1, 2023

Great!
I also faced a similar problem recently.

I temporarily set the Concurrency of subscription to 1, the default is 2.😁
Of course, if it is update, it supports more concurrency.

impl := controller.NewContext(ctx, rec, controller.ControllerOptions{WorkQueueName: ctrTypeName, Logger: logger})

if options.Concurrency == 0 {
options.Concurrency = DefaultThreadsPerController
}

var (
// DefaultThreadsPerController is the number of threads to use
// when processing the controller's workqueue. Controller binaries
// may adjust this process-wide default. For finer control, invoke
// Run on the controller directly.
// TODO rename the const to Concurrency and deprecated this
DefaultThreadsPerController = 2
)

@l-qing
Copy link

l-qing commented Jan 3, 2023

@pierDipi I see. I will make a PR to change patch logic to Update. 😄

Do you still plan to do this PR?

@seongjae-min
Copy link
Contributor Author

@l-qing I think I can make a PR this week. I had no time to handle this issue recently 😭 . Is it an urgent thing for you?

@l-qing
Copy link

l-qing commented Jan 3, 2023

@l-qing I think I can make a PR this week. I had no time to handle this issue recently 😭 . Is it an urgent thing for you?

It's great! Looking forward to your good news.👍
I'm just asking. I didn't have time to deal with this for the next two weeks either. 😂

knative-prow bot pushed a commit that referenced this issue Jan 25, 2023
Fixes #6636 

<!-- Please include the 'why' behind your changes if no issue exists -->

In `Subscription`'s reconcile loops, physical channel is updated by
`PATCH` logic. It occurs broken sync between the channel and
subscriptions.

For ensuring a resource version which can check whether conflict
occurred or not, change to `Update` with RetryOnConflict.

## Proposed Changes

<!-- Please categorize your changes:
- 🎁 Add new feature
- 🐛 Fix bug
- 🧹 Update or clean up current behavior
- 🗑️ Remove feature or internal logic
-->

- Change Patch to Update
- Use RetryOnConflict
- Change condition when sync failed

### Pre-review Checklist

<!-- If these boxes are not checked, you will be asked to complete these
requirements or explain why they do not apply to your PR. -->

- [ ] **At least 80% unit test coverage**
- [ ] **E2E tests** for any new behavior
- [ ] **Docs PR** for any user-facing impact
- [ ] **Spec PR** for any new API feature
- [ ] **Conformance test** for any change to the spec

**Release Note**

<!--
📄 If this change has user-visible impact, write a release
note in the block
below. Include the string "action required" if additional action is
required of
users switching to the new release, for example in case of a breaking
change.

Write as if you are speaking to users, not other Knative contributors.
If this
change has no user-visible impact, no release note is needed.
-->

```release-note

```


**Docs**

<!--
📖 If this change has user-visible impact, link to an issue or PR in
https://github.com/knative/docs.
-->

# Open Questions
If there are some users who are already affected by the bug related to
this issue, this PR cannot fix them. What should we do?

Co-authored-by: Pierangelo Di Pilato <pierangelodipilato@gmail.com>
@pierDipi pierDipi reopened this Feb 6, 2023
knative-prow bot pushed a commit that referenced this issue Feb 6, 2023
…6670) (#6724)

This reverts commit 4d6e1fc.

It has the side effect of dropping channel spec fields, so even
immutable
fields are dropped, hence channels will fail to get updated.

We need to re-evalute the approach to fix the orginal issue:
#6636, patch will always
have edge cases that will lead the original bug because subscriptions
are reconciled independently from each other (and potentially by 
multiple controller replicas), so update is the only way of having
concurrency control at the resource level but we should make sure
that we're preserving unknown fields when updating channelables.
knative-prow bot pushed a commit that referenced this issue Feb 6, 2023
…rce version (#6670) (#6725)

This reverts commit
4d6e1fc.

It has the side effect of dropping channel spec fields, so even
immutable
fields are dropped, hence channels will fail to get updated.

We need to re-evalute the approach to fix the orginal issue:
#6636, patch will always
have edge cases that will lead the original bug because subscriptions
are reconciled independently from each other (and potentially by
multiple controller replicas), so update is the only way of having
concurrency control at the resource level but we should make sure
that we're preserving unknown fields when updating channelables.
pierDipi added a commit to pierDipi/eventing that referenced this issue May 17, 2023
…rce version (knative#6670) (knative#6725)

This reverts commit
knative@4d6e1fc.

It has the side effect of dropping channel spec fields, so even
immutable
fields are dropped, hence channels will fail to get updated.

We need to re-evalute the approach to fix the orginal issue:
knative#6636, patch will always
have edge cases that will lead the original bug because subscriptions
are reconciled independently from each other (and potentially by
multiple controller replicas), so update is the only way of having
concurrency control at the resource level but we should make sure
that we're preserving unknown fields when updating channelables.
openshift-merge-robot pushed a commit to openshift-knative/eventing that referenced this issue May 17, 2023
* [release-1.9] Format Go code (knative#6706)

This is an automated cherry-pick of knative#6702

Signed-off-by: Knative Automation <automation@knative.team>
Co-authored-by: Knative Automation <automation@knative.team>

* [release-1.9] Revert "Change subscription patch logic to ensure resource version (knative#6670) (knative#6725)

This reverts commit
knative@4d6e1fc.

It has the side effect of dropping channel spec fields, so even
immutable
fields are dropped, hence channels will fail to get updated.

We need to re-evalute the approach to fix the orginal issue:
knative#6636, patch will always
have edge cases that will lead the original bug because subscriptions
are reconciled independently from each other (and potentially by
multiple controller replicas), so update is the only way of having
concurrency control at the resource level but we should make sure
that we're preserving unknown fields when updating channelables.

* [release-1.9] Scheduler doesn't reschedule vpods that are scheduled on unscehdulable pods (knative#6730)

This is an automated cherry-pick of knative#6726

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* [release-1.9] Improve scheduler logging for state and pending vpods (knative#6734)

This is an automated cherry-pick of knative#6729

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* [release-1.9] Add function to check if a broker resource is `NotReady` (knative#6738)

This is an automated cherry-pick of knative#6737

Signed-off-by: Christoph Stäbler <cstabler@redhat.com>
Co-authored-by: Christoph Stäbler <cstabler@redhat.com>

* [release-1.9] Extract scheduler config in a dedicate struct instead of many parameters (knative#6740)

This is an automated cherry-pick of knative#6736

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* [release-1.9] Exclusive access to tracing flag for upgrade prober (knative#6768)

This is an automated cherry-pick of knative#6767

```release-note

```

Co-authored-by: Martin Gencur <mgencur@redhat.com>

* [release-1.9] Upgrade to latest dependencies (knative#6775)

bumping pkg -dprotaso

/cc knative/eventing-writers
/assign knative/eventing-writers

Produced by: knative-sandbox/knobots/actions/update-deps

Signed-off-by: Knative Automation <automation@knative.team>

---------

Signed-off-by: Knative Automation <automation@knative.team>
Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Christoph Stäbler <cstabler@redhat.com>
Co-authored-by: Knative Prow Robot <knative-prow-robot@google.com>
Co-authored-by: Knative Automation <automation@knative.team>
Co-authored-by: Christoph Stäbler <cstabler@redhat.com>
Co-authored-by: Martin Gencur <mgencur@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/channels kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants