Streaming refresh for kubernetes using watches #271

agrare · 2018-08-10T14:46:08Z

Add the ability to use watches in the RefreshWorker in place of the standard queued full/targeted refreshes.

Tasks still pending:

~~Since there is no indication if a watch started with resource_version => "0" has "completed" we need to do an initial full refresh, get the resourceVersion of the collection, then start the watch.~~
~~At present this will only add/edit entities using the TargetedCollection persister.~~

~~Depends on: ManageIQ/manageiq#17531~~

Fryguy · 2018-08-10T20:42:50Z

Is there a common pattern here (and from other streaming providers) than can be extracted to a Mixin or a base class?

agrare · 2018-08-10T21:04:02Z

Is there a common pattern here (and from other streaming providers)

Yeah probably, this is similar to vmware and kubevirt streaming refreshes. In the BaseManager::Refresher class would probably even work, check if ems.supports_streaming_refresh? and call out to methods to setup the threads/etc...

agrare · 2018-08-16T13:36:23Z

@Ladas can you take a look?
@cben if you have time I'd love to get your opinion as well.

We want to try to unify the parsers but for now we just need to get pods and templates so that will be some future refactoring.

miq-bot · 2018-08-16T19:23:28Z

Checked commits agrare/manageiq-providers-kubernetes@f32af19~...f638a37 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
10 files checked, 1 offense detected

app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb

❗ - Line 117, Col 3 - Style/RescueStandardError - Avoid rescuing without specifying an error class.

Ladas · 2018-08-17T06:31:07Z

app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb

+  def start_watch_threads
+    _log.info("#{log_header} Starting watch threads...")
+
+    entity_types.each do |entity_type|


hm I remember there was an issue that watch is being disconnected every hour or so? Is that still a case @cben

So we would need a logic checking if the watch died?

afaik yes, ManageIQ/kubeclient#273, not solved.

Ladas · 2018-08-17T06:37:02Z

app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb

+
+  def save_resource_versions(inventory)
+    entity_types.each do |entity_type|
+      resource_versions[entity_type] = inventory.collector.send(entity_type).resourceVersion


going forward, we should probably store this to some table?

The point would be not having full refresh and streaming connected. Since we want to start streaming right away, and not wait like half hour(or hours) for the full refresh.

Can we get the 'whole collection resourceVersion' from watches? We should add a new table, where we store the latest resource versions of watches.

Hopefully, we should be able to check that the last 'whole collection resourceVersion' is still in watches, so we can avoid a full refresh even when the worker is restarted.

So we would do full refresh, only if we'll detect a gap.

Let's first get this working well with streaming only after full refresh.

Can we get the 'whole collection resourceVersion' from watches?

Not directly, but last received version of individual object is the thing to use as collection version to watch from.
We'll need to figure out "version skew" between components — when all parts are busy and queue is not empty, do we want last recieved version / last dequeued version / last persisted version?
(Here not an issue as save_resource_versions only called in quiet moment after full refresh)

So we would do full refresh, only if we'll detect a gap.

k8s says with etcd3 they only keep history ~5min back => not sure this matters much — if we were down, we'll frequently have a gap...

Ladas

Looks like a great start 👍

I have few comments that we should solve in next PRs

cben

Still reviewing, like what I saw so far...

I like how the full flow is visible in do_work_streaming_refresh ❤️
In-process queue is simpler in so many ways than MiqQueue... :)

cben · 2018-08-20T13:35:18Z

app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb

+  def start_watch_threads
+    _log.info("#{log_header} Starting watch threads...")
+
+    entity_types.each do |entity_type|


afaik yes, ManageIQ/kubeclient#273, not solved.

cben · 2018-08-20T13:44:29Z

app/models/manageiq/providers/kubernetes/container_manager_mixin.rb

@@ -26,6 +30,10 @@ def supports_metrics?
    endpoints.where(:role => METRICS_ROLES).exists?
  end

+  def streaming_refresh_enabled?
+    Settings.ems_refresh[emstype.to_sym]&.streaming_refresh
+  end


nit: can you move this nearer supports :streaming_refresh (or inline it)

Ladas

Ok, so it looks fine in overall, I am going to merge.

We'll fix the pending issues in follow up. (I am gonna build the service catalog refresh on this)

cben · 2018-08-23T07:54:38Z

app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb

+
+    until finish.value
+      watch_stream.each { |notice| queue.push(notice) }
+    end


This until is ineffective.
watch_stream.each {...} may run infinitely! In practice it sometimes stops, but on order of hour, not 10sec as stop_watch_threads hopes...
And if it stops, you can't restart same watch_stream, you need a new start_watch connection.

You could move the check inside the each, something like this:

watch_stream.each do |notice| queue.push(notice) break if finish.value end

(untested, not sure kubeclient cleans up correctly with break)
However, this too only stops after next watch notice!

You could call watch_stream.finish from stop_watch_threads to stop faster.
It's a violent kludge — closes the underlying http connection, letting each crash and hopefully rescue 💥 but IMO good enough.

Yeah I noticed that this always ends up exiting only after the timeout and killing the threads.

I pulled the logic from KubeVirt https://github.com/ManageIQ/manageiq-providers-kubevirt/blob/master/app/models/manageiq/providers/kubevirt/infra_manager/refresh_worker/runner.rb#L215-L219 so someone should tell them 😄

not sure kubeclient cleans up correctly with break

Probably no worse than the process exiting out from under the threads because they failed to join! 😆

However, this too only stops after next watch notice!

Yeah I tried moving to this approach with my last PR but if no notices were delivered it wasn't much better. You could definitely argue a production system would have no issue with timely changes though.

You could call watch_stream.finish from stop_watch_threads to stop faster.

I'll give this a try thanks! Definitely sounds like the best bad approach.

cben · 2018-08-23T08:09:21Z

app/models/manageiq/providers/kubernetes/container_manager/streaming_refresh_mixin.rb

+
+  def save_resource_versions(inventory)
+    entity_types.each do |entity_type|
+      resource_versions[entity_type] = inventory.collector.send(entity_type).resourceVersion


Q: how does this line work? what's say collector.pods.resourceVersion?
AFAICT, collector.pods will return an array.
Did you mean send(entity_type).last.resourceVersion?

we should use resource version of the collection, not object, right?

Ah, my bad, I initially thought collector is a Kubernetes::Inventory::Collector::Watches, only later realized it's full refresh. I see, get_pods result is not just an array, it has .resourceVersion 👍

Yeah I want to try to compartmentalize the collection resource_version logic in the collector, so that this could do collector.pods_resource_version and the full collector would just use the resourceVersion from the collection and the watches collector would populate it from the last entity's resourceVersion, right now this logic is in this mixin which IMO is less than ideal but works for now. It is on my list of refactorings :)

agrare mentioned this pull request Aug 10, 2018

Streaming refresh for openshift using watches ManageIQ/manageiq-providers-openshift#103

Merged

miq-bot added the wip label Aug 10, 2018

agrare force-pushed the streaming_refresh branch from a072dd2 to 709e770 Compare August 10, 2018 19:22

agrare changed the title ~~[WIP] Streaming refresh for kubernetes using watches~~ Streaming refresh for kubernetes using watches Aug 10, 2018

miq-bot removed the wip label Aug 10, 2018

agrare added 8 commits August 15, 2018 16:25

Use watches for each entity type in threads

f32af19

Move to a mixin to be included by openshift

3e253af

Make it easier for openshift to overload

a756876

Initial parsing&saving of container_groups

5490517

Add supports_streaming_refresh?

a8570a4

Only use watches if streaming_refresh is enabled

9787617

Reduce complexity of do_work method

9279feb

Redundant self.finish.value

f0db933

agrare force-pushed the streaming_refresh branch 2 times, most recently from 2a43c13 to 869a8f4 Compare August 16, 2018 00:51

Run a full refresh before setting up watches

a555195

agrare force-pushed the streaming_refresh branch 3 times, most recently from 8f461fa to 542b7d4 Compare August 16, 2018 13:32

Some refactoring

5daaad4

agrare force-pushed the streaming_refresh branch from 542b7d4 to 5daaad4 Compare August 16, 2018 13:33

agrare force-pushed the streaming_refresh branch from fc52ce5 to 67266cc Compare August 16, 2018 18:05

agrare added 2 commits August 16, 2018 14:07

Unify on one parser

b92af68

Collect namespaces

f638a37

agrare force-pushed the streaming_refresh branch from 67266cc to f638a37 Compare August 16, 2018 19:23

Ladas reviewed Aug 17, 2018

View reviewed changes

Ladas approved these changes Aug 17, 2018

View reviewed changes

cben reviewed Aug 20, 2018

View reviewed changes

Ladas approved these changes Aug 22, 2018

View reviewed changes

Ladas self-assigned this Aug 22, 2018

Ladas added the enhancement label Aug 22, 2018

Ladas merged commit 220e784 into ManageIQ:master Aug 22, 2018

cben added the inventory label Aug 22, 2018

agrare deleted the streaming_refresh branch August 22, 2018 12:38

agrare mentioned this pull request Aug 22, 2018

Move streaming_refresh_enabled method #274

Merged

cben reviewed Aug 23, 2018

View reviewed changes

agrare mentioned this pull request Aug 23, 2018

Finish the watch stream connection when exiting #278

Merged

agrare modified the milestones: Sprint 94 Ending Sept 10, 2018, Sprint 93 Ending Aug 27, 2018 Aug 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming refresh for kubernetes using watches #271

Streaming refresh for kubernetes using watches #271

agrare commented Aug 10, 2018 •

edited

Loading

Fryguy commented Aug 10, 2018 •

edited

Loading

agrare commented Aug 10, 2018

agrare commented Aug 16, 2018

miq-bot commented Aug 16, 2018

Ladas Aug 17, 2018

cben Aug 20, 2018

Ladas Aug 17, 2018

Ladas Aug 17, 2018

cben Aug 23, 2018

Ladas left a comment

cben left a comment

cben Aug 20, 2018

cben Aug 20, 2018

Ladas left a comment

cben Aug 23, 2018

agrare Aug 23, 2018

cben Aug 23, 2018 •

edited

Loading

Ladas Aug 23, 2018

cben Aug 23, 2018 •

edited

Loading

agrare Aug 23, 2018

Streaming refresh for kubernetes using watches #271

Streaming refresh for kubernetes using watches #271

Conversation

agrare commented Aug 10, 2018 • edited Loading

Fryguy commented Aug 10, 2018 • edited Loading

agrare commented Aug 10, 2018

agrare commented Aug 16, 2018

miq-bot commented Aug 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ladas left a comment

Choose a reason for hiding this comment

cben left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ladas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cben Aug 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cben Aug 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrare commented Aug 10, 2018 •

edited

Loading

Fryguy commented Aug 10, 2018 •

edited

Loading

cben Aug 23, 2018 •

edited

Loading

cben Aug 23, 2018 •

edited

Loading