Skip to content

Router failure performance #269

@tock-ibm

Description

@tock-ibm

When router fails and is connected to a batcher that is primary, we see a lot of forwarding from secondaries to primary; this can go on for ever and is very inefficient.

Possible solutions:

Bucket cutting resolution - 1st strike TO /10

When mempool is full - don't throw away the message, wait and retry to insert to mempool; especially in broadcast - where there is no way to inform the client

Changing term when there are too many first strikes or something like that; complain on it

Forwarding buckets instead of single reqs

Randomized bucket close times so that secondaries don't all get 1st strike TO at the same time

Router failure detector in batcher - if router fails abdicate from primary role

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions