Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

kubernetes/test: fix slow tests #2019

Merged
merged 2 commits into from
Nov 10, 2020
Merged

Conversation

shashankram
Copy link
Member

@shashankram shashankram commented Nov 10, 2020

Description:
The tests had synchronization issues resulting from missing and blocking
event notifications.

The missing events were a result of the tests
not waiting till the namespace used in the test is monitored, which
in turn causes the caches for the other Kubernetes resources to
be filtered out as unmonitored. The result is that the resource is
only marked as monitored upon the next resync from k8s informers
(currently every 30s). This is fixed by ensuring an async wait
on the namespace to be monitored before waiting on k8s events for
resources belonging to the namespace.

Since unbuffered channels are used for announcements, it is better
to read as soon as data is available rather than to batch reads,
which results in writes being blocked. This is also addressed in
the tests.

This change speeds up by the test by over 60s to 90s when the events
are missed which was happening often.

It also removes some unnecessary code in the tests.

Resolves #2014

Before this change:

ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.779s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.769s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.825s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.787s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.791s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.794s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	60.869s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.797s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.769s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	90.796s

After this change:

ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.123s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.103s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.099s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.092s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.102s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.117s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.099s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.140s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.096s
ok  	github.com/openservicemesh/osm/pkg/kubernetes	1.094s

Affected area:

  • New Functionality [ ]
  • Documentation [ ]
  • Install [ ]
  • Control Plane [ ]
  • CLI Tool [ ]
  • Certificate Management [ ]
  • Networking [ ]
  • Metrics [ ]
  • SMI Policy [ ]
  • Security [ ]
  • Tests [X]
  • CI System [ ]
  • Performance [ ]
  • Other [ ]

Please answer the following questions with yes/no.

  • Does this change contain code from or inspired by another project? If so, did you notify the maintainers and provide attribution?
    No

The tests had synchronization issues resulting from missing and blocking
event notifications.

The missing events were a result of the tests
not waiting till the namespace used in the test is monitored, which
in turn causes the caches for the other Kubernetes resources to
be filtered out as unmonitored. The result is that the resource is
only marked as monitored upon the next resync from k8s informers
(currently every 30s). This is fixed by ensuring an async wait
on the namespace to be monitored before waiting on k8s events for
resources belonging to the namespace.

Since unbuffered channels are used for announcements, it is better
to read as soon as data is available rather than to batch reads,
which results in writes being blocked. This is also addressed in
the tests.

This change speeds up by the test by over 60s to 90s when the events
are missed which was happening often.

Resolves openservicemesh#2014
@shashankram shashankram requested a review from a team as a code owner November 10, 2020 20:19
eduser25
eduser25 previously approved these changes Nov 10, 2020
pkg/kubernetes/client_test.go Outdated Show resolved Hide resolved
@codecov-io
Copy link

Codecov Report

Merging #2019 (cdb5e36) into main (3456647) will decrease coverage by 0.21%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2019      +/-   ##
==========================================
- Coverage   56.72%   56.51%   -0.22%     
==========================================
  Files         141      142       +1     
  Lines        5706     5768      +62     
==========================================
+ Hits         3237     3260      +23     
- Misses       2466     2505      +39     
  Partials        3        3              
Impacted Files Coverage Δ
pkg/kubernetes/event_handlers.go 68.42% <0.00%> (-15.79%) ⬇️
cmd/osm-controller/osm-controller.go 28.64% <0.00%> (-2.76%) ⬇️
pkg/reconciler/mutatingwebhook/reconcile.go 82.14% <0.00%> (ø)
pkg/envoy/route/config.go 96.03% <0.00%> (+0.79%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3456647...cdb5e36. Read the comment docs.

@shashankram shashankram merged commit de80157 into openservicemesh:main Nov 10, 2020
@shashankram shashankram deleted the fix-test branch November 10, 2020 20:59
draychev pushed a commit to draychev/osm that referenced this pull request Nov 19, 2020
The tests had synchronization issues resulting from missing and blocking
event notifications.

The missing events were a result of the tests
not waiting till the namespace used in the test is monitored, which
in turn causes the caches for the other Kubernetes resources to
be filtered out as unmonitored. The result is that the resource is
only marked as monitored upon the next resync from k8s informers
(currently every 30s). This is fixed by ensuring an async wait
on the namespace to be monitored before waiting on k8s events for
resources belonging to the namespace.

Since unbuffered channels are used for announcements, it is better
to read as soon as data is available rather than to batch reads,
which results in writes being blocked. This is also addressed in
the tests.

This change speeds up by the test by over 60s to 90s when the events
are missed which was happening often.

Resolves openservicemesh#2014
draychev pushed a commit to draychev/osm that referenced this pull request Nov 19, 2020
The tests had synchronization issues resulting from missing and blocking
event notifications.

The missing events were a result of the tests
not waiting till the namespace used in the test is monitored, which
in turn causes the caches for the other Kubernetes resources to
be filtered out as unmonitored. The result is that the resource is
only marked as monitored upon the next resync from k8s informers
(currently every 30s). This is fixed by ensuring an async wait
on the namespace to be monitored before waiting on k8s events for
resources belonging to the namespace.

Since unbuffered channels are used for announcements, it is better
to read as soon as data is available rather than to batch reads,
which results in writes being blocked. This is also addressed in
the tests.

This change speeds up by the test by over 60s to 90s when the events
are missed which was happening often.

Resolves openservicemesh#2014
draychev pushed a commit to draychev/osm that referenced this pull request Dec 14, 2020
The tests had synchronization issues resulting from missing and blocking
event notifications.

The missing events were a result of the tests
not waiting till the namespace used in the test is monitored, which
in turn causes the caches for the other Kubernetes resources to
be filtered out as unmonitored. The result is that the resource is
only marked as monitored upon the next resync from k8s informers
(currently every 30s). This is fixed by ensuring an async wait
on the namespace to be monitored before waiting on k8s events for
resources belonging to the namespace.

Since unbuffered channels are used for announcements, it is better
to read as soon as data is available rather than to batch reads,
which results in writes being blocked. This is also addressed in
the tests.

This change speeds up by the test by over 60s to 90s when the events
are missed which was happening often.

Resolves openservicemesh#2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make pkg/kubernetes tests faster
4 participants