/fleet/unit excessive read requests to Etcd #1257

mwitkow · 2015-06-18T16:26:41Z

We're running an Etcd 2.1-alpha1, with 5 core cluster and the rest of nodes (hundreds) in Etcd proxy configuration.

As per graphs in etcd-io/etcd#3001 we're seeing a storm of recursive GETs coming from local host fleet, on each of the nodes in the cluster. We're running about 1200 units on the cluster.

We've captured a TCP dump (available on request) that shows that Fleet has made ~6000 HTTP get requests in about 120s. We're not running any fleetctl commands on the host, so I assume this is the local fleetd.

Most of these are GET requests to individual /v2/keys/_coreos.com/fleet/unit/<unit_id> in consistent, recursive mode.

Is there some reason that prohibits fleet from doing one big recursive request on /v2/keys/_coreos.com/fleet/unit/oing an individual request per unit instead of a recursive one across all? Can you point us at the code that does that?

The text was updated successfully, but these errors were encountered:

mwitkow · 2015-06-18T16:48:40Z

So the only place that calls /unit/<unit_id> in recursive mode seems to be
registry/job.go:257
func (r *EtcdRegistry) getUnitFromObjectNode(node *etcd.Node) (*job.Unit, error)
which is only called in
registry/job.go:151
func (r *EtcdRegistry) dirToUnit(dir *etcd.Node) (*job.Unit, error)

this is called in two places:
registry/job.go:133
func (r *EtcdRegistry) Unit(name string) (*job.Unit, error)
(which I assume is called for a single unit fetch)
or in
registry/job.go:90
func (r *EtcdRegistry) Units() ([]job.Unit, error)

which does match the symptoms we see in the tcp dump (see highlighted /job sorted request)

wuqixuan · 2015-06-29T12:33:10Z

Currently, the design is thta all agent get all unit files periodically from etcd. If there are many units in etcd and agent is many, the unit HTTP get request will be huge.
I think need implement a cache mechanism in agent side. Only if the unit change, the agent get the latest status from etcd, otherwise, will not send unit file HTTP get request to etcd.
@bcwaldon @jonboulle @crawford, what do you think ? If need, I can help to implement it to enhance the performance of etcd server.

mwitkow · 2015-06-29T12:46:42Z

#1260
It does a single recursive for all units read on registry.Units() call, instead of making one for Unit.

jonboulle · 2015-10-12T23:55:06Z

Fixed by #1376. Thanks for the patch!

mischief added the dependency/etcd label Jun 25, 2015

jonboulle closed this as completed Oct 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/fleet/unit excessive read requests to Etcd #1257

/fleet/unit excessive read requests to Etcd #1257

mwitkow commented Jun 18, 2015

mwitkow commented Jun 18, 2015

wuqixuan commented Jun 29, 2015

mwitkow commented Jun 29, 2015

jonboulle commented Oct 12, 2015

/fleet/unit excessive read requests to Etcd #1257

/fleet/unit excessive read requests to Etcd #1257

Comments

mwitkow commented Jun 18, 2015

mwitkow commented Jun 18, 2015

wuqixuan commented Jun 29, 2015

mwitkow commented Jun 29, 2015

jonboulle commented Oct 12, 2015