Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique EmsRefresh.refresh targets if there are over 1,000 targets #16432

Merged
merged 1 commit into from
Nov 27, 2017

Conversation

agrare
Copy link
Member

@agrare agrare commented Nov 9, 2017

RFC on if we should put back the unique check for ems_refresh targets

cc @Fryguy @Ladas

https://bugzilla.redhat.com/show_bug.cgi?id=1516401

Copy link
Contributor

@Ladas Ladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the biggest problem might be in OpenShift provider, but:

  1. We queue only DELETED events now, which are smaller
  2. We disable the InventoryCollectorWorker queuing if the Refresh worker dies

That means we should always be at quite low amount of targets queued, then the uniq per queuing is not that bad.


What this solves though is, that we just saw how refresh worker failure could have lead to having 2M targets queued, where it's just like 400 unique targets possible. The 2M targets were around 4GB in memory, killing all priority, Generic and Refresh workers that met it.

cc @cben

@agrare
Copy link
Member Author

agrare commented Nov 9, 2017

So I tried queueing 2 million targets for amazon as a test:

ems = ManageIQ::Providers::Amazon::CloudManager.first
EmsRefresh.queue_refresh(2_000_000.times.map { ems })

The resulting queue item had ["msg_data", "<104000008 bytes of binary data>"] so around 100MB of data.

The RefreshWorker grew from 392MB to 1.2GB when dequeueing it.

@agrare agrare changed the title [WIP] Add back unique check for ems_refresh targets Add back unique check for ems_refresh targets Nov 21, 2017
@agrare agrare removed the wip label Nov 21, 2017
@agrare
Copy link
Member Author

agrare commented Nov 21, 2017

@Fryguy we saw this on a 5.9.0.9 appliance so the watch collector was disabled. I still think there is something else going on to allow this to happen (refresh worker should have dequeued this) but this certainly isn't helping matters.

@agrare agrare force-pushed the add_back_ems_refresh_unique_target branch 2 times, most recently from 7ce5298 to 4e7b2c2 Compare November 21, 2017 17:59
@agrare
Copy link
Member Author

agrare commented Nov 21, 2017

Okay @Fryguy this will uniq the target list every 1000 targets. 1000 ManagerRefresh::Targets marshal dumped is 32 KiB

@agrare agrare changed the title Add back unique check for ems_refresh targets Unique EmsRefresh.refresh targets if there are over 1,000 targets Nov 21, 2017
@@ -20,6 +20,10 @@ def self.debug_trace
Settings.ems_refresh[:debug_trace]
end

def self.max_targets_uniq_threshold
Settings.ems_refresh[:max_targets_uniq_threshold] || 1_000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer Settings.ems_refresh.max_targets_uniq_threshold and get rid of the || 1000, because we will always have a value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note, why make it configurable at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, right, maybe this config doesn't make that much of a sense

@agrare agrare force-pushed the add_back_ems_refresh_unique_target branch from 4e7b2c2 to 3543058 Compare November 21, 2017 22:28
Copy link
Contributor

@Ladas Ladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@agrare agrare force-pushed the add_back_ems_refresh_unique_target branch from 3543058 to d100b56 Compare November 22, 2017 14:59
@miq-bot
Copy link
Member

miq-bot commented Nov 22, 2017

Checked commit agrare@d100b56 with ruby 2.3.3, rubocop 0.47.1, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. 👍

@agrare
Copy link
Member Author

agrare commented Nov 22, 2017

Linked to the new bug to track just this issue
@Fryguy can you take another look?

@Fryguy Fryguy merged commit fdb09c2 into ManageIQ:master Nov 27, 2017
@Fryguy Fryguy added this to the Sprint 74 Ending Nov 27, 2017 milestone Nov 27, 2017
@agrare agrare deleted the add_back_ems_refresh_unique_target branch November 27, 2017 17:08
simaishi pushed a commit that referenced this pull request Nov 27, 2017
Unique EmsRefresh.refresh targets if there are over 1,000 targets
(cherry picked from commit fdb09c2)

https://bugzilla.redhat.com/show_bug.cgi?id=1517962
@simaishi
Copy link
Contributor

Gaprindashvili backport details:

$ git log -1
commit d20081cd1cd829e03289734521bc37de6186b6a1
Author: Jason Frey <fryguy9@gmail.com>
Date:   Mon Nov 27 12:03:04 2017 -0500

    Merge pull request #16432 from agrare/add_back_ems_refresh_unique_target
    
    Unique EmsRefresh.refresh targets if there are over 1,000 targets
    (cherry picked from commit fdb09c20fdb778fb3ef0238044b2ce907f43e1ff)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1517962

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants