Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

fleetd:registry: improve Units() performance remove extra loops when fetching Jobs list from etcd #1515

Merged

Conversation

tixxdz
Copy link
Contributor

@tixxdz tixxdz commented Mar 22, 2016

Optimize Units() call when we fetch and process Jobs and Units from etcd.

We tested this with 500 services on 3 nodes where previously it would take the fleetctl client to get the list of units from fleet and etcd ~4secs to now ~0.9secs.

Actually this will discharge fleet from doing extra sorting logic and two extra loops at the same process. One thing to note is that we request etcd to sort the Job Units for us, so I removed it from fleet, and it seems it is the same sorting logic.

@tixxdz
Copy link
Contributor Author

tixxdz commented Mar 22, 2016

@jonboulle if it's green! and no one objects then it's probably ok to merge it.

One point: I made it into 2 patches since the second one will remove the sorting logic completely, and lets only etcd do it. Maybe there are corner cases of both etcd and fleet sorting algorithms which I didn't check, if you think it's worth it! I'll do it later.

Btw I was going also to add goroutines there so we fetch job keys and unit keys at the same time, but didn't... Not sure it may add load later on etcd if you do lot of load or queries...

Thank you!

@tixxdz tixxdz self-assigned this Mar 22, 2016
@antrik
Copy link
Contributor

antrik commented Mar 22, 2016

@tixxdz I wonder, have you checked whether the first patch alone actually has any measurable performance impact? I have a hard time believing that just reshuffling the loops like that would have a significant effect...

Avoiding the sort OTOH is a good catch :-)

uMap := make(map[string]*job.Unit)
// make it at least size of Nodes
uMap := make(map[string]bool, len(res.Node.Nodes))
units := make([]job.Unit, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are giving the map an initial capacity, why not the slice as well?...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the map is local collected, the slice is not.

@jonboulle
Copy link
Contributor

We tested this with 500 services on 3 nodes where previously it would take the fleetctl client to get the list of units from fleet and etcd ~4secs to now ~0.9secs.

Can you show some numbers or a little more detail? I find it hard to see how a 500-string sort is taking 3 seconds... what else has changed?

@tixxdz
Copy link
Contributor Author

tixxdz commented Mar 23, 2016

Can you show some numbers or a little more detail? I find it hard to see how a 500-string sort is taking 3 seconds... what else has changed?

Actually you are right, we tested this with anton, but with the reverted patch of this PR: #1512 which was the one taking all the RPC+I/O time.

Anyway it's pretty same results unless reconcile is triggered then at that time the new version will be discharged from those extra loops.

Since this is a simple patch just timing fleetctl when reconcile is triggered, no need to check fleetd in depth.

Old one:
real 0m1.621s
user 0m0.026s
sys 0m0.013s

real 0m0.721s
user 0m0.021s
sys 0m0.012s

New one:
real 0m1.155s
user 0m0.024s
sys 0m0.028s

real 0m0.665s
user 0m0.025s
sys 0m0.012s

@tixxdz tixxdz force-pushed the tixxdz/fleetd-quick-units-optimisations-v1 branch from 66d31fd to 93d3630 Compare April 11, 2016 08:26
@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 11, 2016

@jonboulle @antrik patch made simple, if no objections it will be pushed after 0.12 release, thanks!

@jonboulle
Copy link
Contributor

LGTM but do we already have a test covering this?

@jonboulle jonboulle added this to the v0.13.0 milestone Apr 11, 2016
@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 11, 2016

LGTM but do we already have a test covering this?

How many units we get back is already covered by scheduling_tests that uses fleetctl list-unit-files to check the matching number, now for the order I'm not sure that we have a cover test, we assume that etcd returns it ordered. Will write a quick one for it then. Thanks for the comment.

@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 13, 2016

Update: the functional test of this will depend on the new functionality that will be introduced in #1544 to get the units with list-unit-files and make sure that it's sorted. I can write it my self but that will be just a duplicate where we are already adding the appropriate helpers. Thanks!

@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 13, 2016

To be more precise on this one: functional test should create several units from a template then wait for them to show up, get them sorted and compare.

// Combine units
var units []string
for i := 1; i <= 20; i++ {
unit := fmt.Sprintf("fixtures/units/hello@%d.service", i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as the number of units needs 2 digits, hello@%02d.service would be ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Djalal Harouni added 2 commits April 18, 2016 11:56
Improve Units() call when we fetch and process Jobs and Units from
etcd. Remove extra unnecessary loops.

We also change the code logic a little bit, since it was always storing
the last matched name with the new Unit, but since Job keys do not expect to
have two units with the same name, this should be ok.
…d units

This test makes sure that it fleetd returned an ordered list of units
from Units() call through list-unit-files command.
@tixxdz tixxdz force-pushed the tixxdz/fleetd-quick-units-optimisations-v1 branch from 264e30f to 9963e8b Compare April 18, 2016 09:59
@tixxdz tixxdz merged commit c9ba807 into coreos:master Apr 18, 2016
}
sortable.Sort()

var inUnits []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is weird, why do you create another slice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this just to reflect how units are sorted and where sorted! since inUnits are sent using a counter which does not reflect how units are sorted and returned from etcd and golang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and to follow up with DeepEqual()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I also moved the logic from the Units() fleetd side to put it here in these lines of the functional test to ensure that Units() returns the same result without these lines....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I follow. But I meant, you can just go directly to a slice: http://play.golang.org/p/xhsmv-XHSK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see ;-D, thank you next time fore sure ;-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants