Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ReadOnlyInterface in IMC dispatcher #4675

Merged
merged 1 commit into from
Dec 22, 2020
Merged

Implement ReadOnlyInterface in IMC dispatcher #4675

merged 1 commit into from
Dec 22, 2020

Conversation

antoineco
Copy link
Contributor

@antoineco antoineco commented Dec 22, 2020

Fixes #4638

Proposed Changes

  • Implement ReadOnlyInterface in the IMC dispatcher.
    Ensures any running dispatcher can handle events for all InMemoryChannels, and not only the ones in the bucket it is leading.

After scaling the dispatcher to 5 replicas, I sent 100 events/s for 1 min to a Channel using vegeta and got a 100% success rate, versus ~20% before this change:

Requests      [total, rate, throughput]         6000, 100.02, 100.00
Duration      [total, attack, wait]             59.999s, 59.99s, 8.99ms
Latencies     [min, mean, 50, 90, 95, 99, max]  2.123ms, 6.119ms, 4.795ms, 10.404ms, 13.405ms, 27.218ms, 70.894ms
Bytes In      [total, mean]                     0, 0.00
Bytes Out     [total, mean]                     12288000, 2048.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:6000
Error Set:

Release Note

:bug: Allow scaling the in-memory Channel dispatcher without disabling high availability.

Docs

Ensures any running dispatcher can handle events for all
InMemoryChannels, and not only the ones in the bucket it is leading.
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Dec 22, 2020
@knative-prow-robot knative-prow-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 22, 2020
Comment on lines +62 to +64
if err := r.reconcile(ctx, imc); err != nil {
return err
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much all this PR does: move the reconciliation logic to a (*Reconciler).reconcile() method and call it from both ReconcileKind() and ObserveKind().

@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-eventing-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/inmemorychannel/dispatcher/inmemorychannel.go 90.9% 89.8% -1.1

@codecov
Copy link

codecov bot commented Dec 22, 2020

Codecov Report

Merging #4675 (66860d0) into master (9643baf) will decrease coverage by 0.00%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4675      +/-   ##
==========================================
- Coverage   81.07%   81.07%   -0.01%     
==========================================
  Files         291      291              
  Lines        8212     8216       +4     
==========================================
+ Hits         6658     6661       +3     
- Misses       1153     1154       +1     
  Partials      401      401              
Impacted Files Coverage Δ
...iler/inmemorychannel/dispatcher/inmemorychannel.go 86.66% <83.33%> (-0.66%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9643baf...bb7222c. Read the comment docs.

@zhongduo
Copy link
Contributor

Thanks for finding the issue. This PR does not change the existing logic, but simply implements the interface with existing reconcile logic. It looks safe to me.
/lgtm
/hold
in case you want a second opinion.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 22, 2020
@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 22, 2020
@antoineco
Copy link
Contributor Author

@zhongduo thanks for the review! Do you happen to know how I can re-run the failed test? It seems unrelated.

@zhongduo
Copy link
Contributor

/retest

@zhongduo
Copy link
Contributor

@zhongduo thanks for the review! Do you happen to know how I can re-run the failed test? It seems unrelated.

It is not uncommon that the downstream test itself has problem. But IIRC they are not required build for merge.

Copy link
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/kind bug

Can we have a regression test?

@knative-prow-robot knative-prow-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 22, 2020
@antoineco
Copy link
Contributor Author

@pierDipi would a unit test suffice? I don't see e2e tests for the IMC.

@pierDipi
Copy link
Member

I see them here:

go_test_e2e -timeout=30m -parallel=20 ./test/e2e \
-brokerclass=MTChannelBasedBroker \
-channels=messaging.knative.dev/v1beta1:Channel,messaging.knative.dev/v1beta1:InMemoryChannel,messaging.knative.dev/v1:Channel,messaging.knative.dev/v1:InMemoryChannel \
-sources=sources.knative.dev/v1beta1:ApiServerSource,sources.knative.dev/v1alpha2:ContainerSource,sources.knative.dev/v1beta1:PingSource,sources.knative.dev/v1beta2:PingSource,sources.knative.dev/v1:ApiServerSource,sources.knative.dev/v1:ContainerSource \

I think even running existing E2E tests with the IMC dispatcher scaled up should be enough.

@antoineco
Copy link
Contributor Author

Ahh, thanks for the pointer 👍

@antoineco
Copy link
Contributor Author

antoineco commented Dec 22, 2020

@pierDipi e2e tests seem to be generic and run the same test suite for all channel types if I'm not mistaken. There is no existing test in which I could scale just the IMC dispatcher, and I can't scale channel dispatchers of any type because that would definitely break something.

Besides, those tests send 1 event and ensure it arrived in the expected shape. To test this particular bug, I need to send a larger amount of events, which changes the way things are being asserted.

For a useful regression test I would first have to come up with a completely new test suite it seems, and that sounds like a topic for a much bigger PR.

@vaikas
Copy link
Contributor

vaikas commented Dec 22, 2020

This change looks like a great surgical change to make things better. As far as having regression tests, that would certainly be great but I agree with @antoineco that it should probably be in a follow on PR. It's maybe something that we can tackle as part of the #3590 since this would alleviate it as well?

As far as the downstream failing test, it mosdef seems unrelated as it has something to do with dupes in vegeta testing:

panic: gob: registering duplicate types for "*vegeta.Result": *vegeta.Result != *vegeta.Result

goroutine 1 [running]:
encoding/gob.RegisterName(0x214e6f0, 0xe, 0x22ab140, 0xc0003be460)
	/opt/hostedtoolcache/go/1.15.6/x64/src/encoding/gob/type.go:820 +0x777
encoding/gob.Register(0x22ab140, 0xc0003be460)
	/opt/hostedtoolcache/go/1.15.6/x64/src/encoding/gob/type.go:874 +0x15f
github.com/tsenart/vegeta/v12/lib.init.0()
	/home/runner/work/eventing/eventing/src/knative.dev/eventing-kafka/vendor/github.com/tsenart/vegeta/v12/lib/results.go:22 +0xad
FAIL	knative.dev/eventing-kafka/test/test_images/kafka_performance	0.037s

/lgtm
/approve

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: antoineco, vaikas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 22, 2020
@pierDipi
Copy link
Member

👍

/unhold

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 22, 2020
@knative-prow-robot knative-prow-robot merged commit cceb3e9 into knative:master Dec 22, 2020
@antoineco antoineco deleted the imc-dispatcher-ro-interface branch December 23, 2020 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leader election prevents scaling of IMC dispatcher, causes dropped events
6 participants