You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.
We're running an Etcd 2.1-alpha1, with 5 core cluster and the rest of nodes (hundreds) in Etcd proxy configuration.
As per graphs in etcd-io/etcd#3001 we're seeing a storm of recursive GETs coming from local host fleet, on each of the nodes in the cluster. We're running about 1200 units on the cluster.
We've captured a TCP dump (available on request) that shows that Fleet has made ~6000 HTTP get requests in about 120s. We're not running any fleetctl commands on the host, so I assume this is the local fleetd.
Most of these are GET requests to individual /v2/keys/_coreos.com/fleet/unit/<unit_id> in consistent, recursive mode.
Is there some reason that prohibits fleet from doing one big recursive request on /v2/keys/_coreos.com/fleet/unit/oing an individual request per unit instead of a recursive one across all? Can you point us at the code that does that?
The text was updated successfully, but these errors were encountered:
So the only place that calls /unit/<unit_id> in recursive mode seems to be
registry/job.go:257 func (r *EtcdRegistry) getUnitFromObjectNode(node *etcd.Node) (*job.Unit, error)
which is only called in
registry/job.go:151 func (r *EtcdRegistry) dirToUnit(dir *etcd.Node) (*job.Unit, error)
this is called in two places:
registry/job.go:133 func (r *EtcdRegistry) Unit(name string) (*job.Unit, error)
(which I assume is called for a single unit fetch)
or in
registry/job.go:90 func (r *EtcdRegistry) Units() ([]job.Unit, error)
which does match the symptoms we see in the tcp dump (see highlighted /job sorted request)
Currently, the design is thta all agent get all unit files periodically from etcd. If there are many units in etcd and agent is many, the unit HTTP get request will be huge.
I think need implement a cache mechanism in agent side. Only if the unit change, the agent get the latest status from etcd, otherwise, will not send unit file HTTP get request to etcd. @bcwaldon@jonboulle@crawford, what do you think ? If need, I can help to implement it to enhance the performance of etcd server.
We're running an Etcd 2.1-alpha1, with 5 core cluster and the rest of nodes (hundreds) in Etcd proxy configuration.
As per graphs in etcd-io/etcd#3001 we're seeing a storm of recursive GETs coming from local host fleet, on each of the nodes in the cluster. We're running about 1200 units on the cluster.
We've captured a TCP dump (available on request) that shows that Fleet has made ~6000 HTTP get requests in about 120s. We're not running any
fleetctl
commands on the host, so I assume this is the localfleetd
.Most of these are GET requests to individual
/v2/keys/_coreos.com/fleet/unit/<unit_id>
in consistent, recursive mode.Is there some reason that prohibits fleet from doing one big recursive request on
/v2/keys/_coreos.com/fleet/unit/
oing an individual request per unit instead of a recursive one across all? Can you point us at the code that does that?The text was updated successfully, but these errors were encountered: