-
Notifications
You must be signed in to change notification settings - Fork 302
fleetd:registry: improve Units() performance remove extra loops when fetching Jobs list from etcd #1515
fleetd:registry: improve Units() performance remove extra loops when fetching Jobs list from etcd #1515
Conversation
@jonboulle if it's green! and no one objects then it's probably ok to merge it. One point: I made it into 2 patches since the second one will remove the sorting logic completely, and lets only etcd do it. Maybe there are corner cases of both etcd and fleet sorting algorithms which I didn't check, if you think it's worth it! I'll do it later. Btw I was going also to add goroutines there so we fetch job keys and unit keys at the same time, but didn't... Not sure it may add load later on etcd if you do lot of load or queries... Thank you! |
@tixxdz I wonder, have you checked whether the first patch alone actually has any measurable performance impact? I have a hard time believing that just reshuffling the loops like that would have a significant effect... Avoiding the sort OTOH is a good catch :-) |
uMap := make(map[string]*job.Unit) | ||
// make it at least size of Nodes | ||
uMap := make(map[string]bool, len(res.Node.Nodes)) | ||
units := make([]job.Unit, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you are giving the map an initial capacity, why not the slice as well?...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the map is local collected, the slice is not.
Can you show some numbers or a little more detail? I find it hard to see how a 500-string sort is taking 3 seconds... what else has changed? |
Actually you are right, we tested this with anton, but with the reverted patch of this PR: #1512 which was the one taking all the RPC+I/O time. Anyway it's pretty same results unless reconcile is triggered then at that time the new version will be discharged from those extra loops. Since this is a simple patch just timing fleetctl when reconcile is triggered, no need to check fleetd in depth. Old one: real 0m0.721s New one: real 0m0.665s |
66d31fd
to
93d3630
Compare
@jonboulle @antrik patch made simple, if no objections it will be pushed after 0.12 release, thanks! |
LGTM but do we already have a test covering this? |
How many units we get back is already covered by scheduling_tests that uses |
Update: the functional test of this will depend on the new functionality that will be introduced in #1544 to get the units with list-unit-files and make sure that it's sorted. I can write it my self but that will be just a duplicate where we are already adding the appropriate helpers. Thanks! |
To be more precise on this one: functional test should create several units from a template then wait for them to show up, get them sorted and compare. |
93d3630
to
264e30f
Compare
// Combine units | ||
var units []string | ||
for i := 1; i <= 20; i++ { | ||
unit := fmt.Sprintf("fixtures/units/hello@%d.service", i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: as the number of units needs 2 digits, hello@%02d.service
would be ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Improve Units() call when we fetch and process Jobs and Units from etcd. Remove extra unnecessary loops. We also change the code logic a little bit, since it was always storing the last matched name with the new Unit, but since Job keys do not expect to have two units with the same name, this should be ok.
…d units This test makes sure that it fleetd returned an ordered list of units from Units() call through list-unit-files command.
264e30f
to
9963e8b
Compare
} | ||
sortable.Sort() | ||
|
||
var inUnits []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is weird, why do you create another slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this just to reflect how units are sorted and where sorted! since inUnits are sent using a counter which does not reflect how units are sorted and returned from etcd and golang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and to follow up with DeepEqual()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically I also moved the logic from the Units() fleetd side to put it here in these lines of the functional test to ensure that Units() returns the same result without these lines....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I follow. But I meant, you can just go directly to a slice: http://play.golang.org/p/xhsmv-XHSK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see ;-D, thank you next time fore sure ;-)
Optimize Units() call when we fetch and process Jobs and Units from etcd.
We tested this with 500 services on 3 nodes where previously it would take the fleetctl client to get the list of units from fleet and etcd ~4secs to now ~0.9secs.
Actually this will discharge fleet from doing extra sorting logic and two extra loops at the same process. One thing to note is that we request etcd to sort the Job Units for us, so I removed it from fleet, and it seems it is the same sorting logic.