Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate multiple refresher workers for Amazon #15711

Merged
merged 15 commits into from
Aug 29, 2017

Conversation

juliancheal
Copy link
Member

@juliancheal juliancheal commented Aug 2, 2017

Attempting to consolidate multiple Amazon Refresher workers into a single refresher worker.

This involves letting MiqQueue.peek return queue messages for more than one queue per worker. And, workers that want to handle messages for more than one queue need to return a list of queue names to MiqServer when prefetching queue messages on each heartbeat.

@miq-bot miq-bot added the wip label Aug 2, 2017
@chessbyte chessbyte requested a review from Fryguy August 2, 2017 16:37
@blomquisg blomquisg assigned Fryguy and agrare and unassigned blomquisg and Fryguy Aug 3, 2017
@blomquisg blomquisg changed the title [WIP] This is experimental. [WIP] Consolidate multiple refresher workers Aug 3, 2017
@juliancheal
Copy link
Member Author

juliancheal commented Aug 3, 2017

| Medium | MiqQueue | MiqQueue.peek | SQL Injection | Possible SQL injection near line 276:

MiqQueue.where([("    state = 'ready'
AND (zone IS NULL OR zone = ?)\n 
AND (role IS NULL OR role IN (?))
AND (server_guid IS NULL OR server_guid = ?)
AND (deliver_on IS NULL OR deliver_on <= ?)
AND (priority <= ?)" + +self.where_queue_name(conditions[:queue_name].is_a?(Array))

https://github.com/juliancheal/manageiq/blob/d1490c51f9880dc2f418dfdcaf727e96f64296dd/app/models/miq_queue.rb#L262

@juliancheal
Copy link
Member Author

This PR relies on ManageIQ/manageiq-providers-amazon#280

@juliancheal
Copy link
Member Author

Getting the issue where when a queue job is created for ManageIQ::Providers::Amazon::CloudManager it's queue_name in miq_queue table is ["ems_1", "ems_3", "ems_2"] and not ems_1.

This isn't happening for Amazon::NetworkManager or ::StorageManager they are both being created in this table with the correct queue_names in this case, ems_2 and ems_3.

/cc @blomquisg, @Fryguy.

return nil unless name == "ems"
id.to_i
if queue_name.kind_of?(Array)
queue_names = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ems_ids instead of queue_names since this block is returning a list of ems_ids

IO.for_fd(socket).close
end
end
end

def queue_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a quick comment for the queue_name accessor overrides. Something like "the queue_name might be an array for some workers, so serialize the queue name to JSON"

@juliancheal juliancheal changed the title [WIP] Consolidate multiple refresher workers Consolidate multiple refresher workers for Amazon Aug 8, 2017
@juliancheal
Copy link
Member Author

@miq-bot remove_label wip

@miq-bot miq-bot removed the wip label Aug 8, 2017
id.to_i
if queue_name.kind_of?(Array)
ems_ids = []
ems_ids.each do |ems_id|
Copy link
Member

@blomquisg blomquisg Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be queue_name instead of ems_ids here ... I couldn't figure out why this was always returning an empty array.

Also, this isn't converting the id to an int.

Though, leaning towards smaller methods, maybe something like this is better anyway:

def parse_ems_id(queue_name)
  return nil if queue_name.blank?                                                                                                                                    
  name, id = queue_name.split("_")                                                                                                                                   
  return nil unless name == "ems"                                                                                                                                    
  id.to_i
end

def ems_id_from_queue_name(queue_name)
  queue_name.kind_of?(Array) ? queue_name.collect { |q| parse_ems_id(q) }.flatten : parse_ems_id(queue_name)
end

This way, the parsing of the ems_id is done in one place one way, not in two places in the if and else block, slightly differently.

This feels redundant (having two separate calls to parse_ems_id), but we need to be able to return arrays when appropriate and single values when appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh what a silly mistake!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really like your foo.collect methods :)

@blomquisg
Copy link
Member

With the change I suggested, I now see these workers:

worker_type queue_name status
ManageIQ::Providers::Amazon::CloudManager::EventCatcher ems_1 started
ManageIQ::Providers::Amazon::CloudManager::RefreshWorker ["ems_1", "ems_3", "ems_2"] started
MiqEventHandler ems started
MiqGenericWorker generic started
MiqScheduleWorker started

And I saw this in the log:

[----] I, [2017-08-08T19:16:40.283480 #32273:2af621ed5110]  INFO -- : MIQ(ManageIQ::Providers::Amazon::CloudManager::Refresher#refresh) Refreshing all targets...
.....
[----] I, [2017-08-08T19:16:55.814041 #32273:2af621ed5110]  INFO -- : MIQ(ManageIQ::Providers::Amazon::CloudManager::Refresher#refresh) Refreshing all targets...Complete
.....
[----] I, [2017-08-08T19:16:55.837777 #32273:2af621ed5110]  INFO -- : MIQ(ManageIQ::Providers::Amazon::StorageManager::Ebs::Refresher#refresh) Refreshing all targets...
.....
[----] I, [2017-08-08T19:16:57.545894 #32273:2af621ed5110]  INFO -- : MIQ(ManageIQ::Providers::Amazon::StorageManager::Ebs::Refresher#refresh) Refreshing all targets...Complete
.....
[----] I, [2017-08-08T19:16:57.561174 #32273:2af621ed5110]  INFO -- : MIQ(ManageIQ::Providers::Amazon::NetworkManager::Refresher#refresh) Refreshing all targets...
.....
[----] I, [2017-08-08T19:17:01.645353 #32273:2af621ed5110]  INFO -- : MIQ(ManageIQ::Providers::Amazon::NetworkManager::Refresher#refresh) Refreshing all targets...Complete

Showing all refreshes being handle by the same worker PID!

Copy link
Contributor

@Ladas Ladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL injection warning is a bit weird. Otherwise this starts to look great, awesome job @blomquisg & @juliancheal :-)

@Ladas
Copy link
Contributor

Ladas commented Aug 16, 2017

@juliancheal @blomquisg so, what is missing here?

The SQL injection warning seems like a false positive, the ? should sanitize all values passed. We can maybe try "{string1}/n{string2}" instead of string1 + string2? Or any other idea to make it go away? :-)

@miq-bot
Copy link
Member

miq-bot commented Aug 17, 2017

This pull request is not mergeable. Please rebase and repush.

Julian Cheal added 2 commits August 21, 2017 15:47
Also added a comment on to why we're overriding queue_name.
To be re-copped another day.
@juliancheal
Copy link
Member Author

@agrare all unrelated cops removed 😢

@agrare
Copy link
Member

agrare commented Aug 21, 2017

@juliancheal thanks!

@juliancheal
Copy link
Member Author

Looks like Breakman is unhappy again, it seemed ok earlier. I'll try @Ladas' fix.

@juliancheal
Copy link
Member Author

@Ladas
Copy link
Contributor

Ladas commented Aug 21, 2017

@juliancheal ok, a stupid solution: can you try to define MIQ_QUEUE_GET_ARRAY and the same for peek, to see if it passes as 1 query and the issue is the concatenation?

Then the best solution would be to rewrite it to arel query I suppose, it's horrible to pass array like this. Or, we can put there a TODO to write AREL and unify those queries. :-D

@juliancheal
Copy link
Member Author

@Ladas I'm not sure I understand this bit:

@juliancheal ok, a stupid solution can you try to define MIQ_QUEUE_GET_ARRAY and the same for peek, to see if it passes as 1 query and the issue is the concatenation?

Totally would rather rewrite this to use Arel query, @blomquisg and I discussed that, but thought best not to bite off more than we can chew in this PR :)

Then the best solution would be to rewrite it to arel query I suppose, it's horribly to pass array like this. Or, we can put there a TODO to write AREL and unify those queries. :-D

@Ladas
Copy link
Contributor

Ladas commented Aug 21, 2017

@juliancheal to write 2 full queries MIQ_QUEUE_GET and MIQ_QUEUE_GET_ARRAY, 1 having miq_queue = ? and another miq_queue IN (?), the same for the other. It's super ugly, but it will help us rule out that the query is wrong and there is a false positive because we are using + to concat 2 queries.

This isn't a tidy replacement, but will be replaced by an Arel query
at some point.
@juliancheal
Copy link
Member Author

juliancheal commented Aug 22, 2017

@Ladas, @Fryguy, @agrare, @blomquisg so it seems that concatenating the sql query was causing Brakeman to freakout. I've replaced it with two queries and that still seems to work, and also makes Brakeman happy.

I think this PR is good to go. What do y'all think?

AND (role IS NULL OR role IN (?))
AND (server_guid IS NULL OR server_guid = ?)
AND (deliver_on IS NULL OR deliver_on <= ?)
AND (priority <= ?)
AND queue_name in (?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrock is there a perf issue with always using IN? I think PG was smart enough to optimize that if it's e.g queue_name IN ('generic')

@@ -4,6 +4,10 @@ class BaseManager < ExtManagementSystem

include Inflector::Methods

has_many :child_managers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter where the asscociation is defined?
I need the :child_managers here too, but for orchestrate_destroy, which is implemented in ExtManagementSystem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh fancy. Having child_managers on EMS seems to make more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree child_managers at the ems level is better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove this then? And then just wait until @durandom's PR is merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliancheal or you can put in a PR to just add :child_managers

Copy link
Member Author

@juliancheal juliancheal Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15889 child manager PR.

So all managers can now find their children.
@juliancheal
Copy link
Member Author

Of course this is now going to break until #15889 is merged. 🍰

@miq-bot
Copy link
Member

miq-bot commented Aug 24, 2017

Checked commits juliancheal/manageiq@80e7848~...67905a3 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
6 files checked, 7 offenses detected

app/models/miq_queue.rb

app/models/miq_worker.rb

@agrare
Copy link
Member

agrare commented Aug 24, 2017

@juliancheal dependent PR merged

@juliancheal
Copy link
Member Author

@agrare you are a gentleman and a scholar.

@agrare
Copy link
Member

agrare commented Aug 24, 2017

return nil if queue_name.blank?
name, id = queue_name.split("_")
return nil unless name == "ems"
id.to_i
end

def ems_id_from_queue_name(queue_name)
queue_name.kind_of?(Array) ? queue_name.collect { |q| parse_ems_id(q) }.flatten : parse_ems_id(queue_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliancheal might Array(queue_name).collect ... make this simpler? Would just be one case then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean, if queue_name isn't an Array turn in into one anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually now that I think of it I like this better since 90% of the time it'll be not an array :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, we will need only IN query for it, but I think we can do that. Seems like PG is smart enough to convert IN with 1 element array into = (look at the index cond below, same applies for seq. scan)

   (1.0ms)  EXPLAIN ANALYZE SELECT "container_images".* FROM "container_images" WHERE (id IN (1710695))
 => [["Index Scan using container_images_pkey on container_images  (cost=0.43..8.45 rows=1 width=685) (actual time=0.026..0.027 rows=1 loops=1)"], ["  Index Cond: (id = 1710695)"], ["Planning time: 0.220 ms"], ["Execution time: 0.076 ms"]] 
2.3.0 :009 > ActiveRecord::Base.connection.query("EXPLAIN ANALYZE #{ContainerImage.where('id=1710695').to_sql}")
   (1.0ms)  EXPLAIN ANALYZE SELECT "container_images".* FROM "container_images" WHERE (id=1710695)
 => [["Index Scan using container_images_pkey on container_images  (cost=0.43..8.45 rows=1 width=685) (actual time=0.026..0.028 rows=1 loops=1)"], ["  Index Cond: (id = 1710695)"], ["Planning time: 0.147 ms"], ["Execution time: 0.076 ms"]]

@agrare
Copy link
Member

agrare commented Aug 29, 2017

This looks good to me, @juliancheal is there anything else that this depends on or is it okay to merge?

@juliancheal
Copy link
Member Author

This one needs merging, ManageIQ/manageiq-providers-amazon#280 but I don't know which one first tbh.

Copy link
Member

@agrare agrare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agrare agrare merged commit bf8f9a8 into ManageIQ:master Aug 29, 2017
@agrare agrare added this to the Sprint 68 Ending Sep 4, 2017 milestone Aug 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants