-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Legion Lord election algorithm #100
Comments
Fix situations where nodes/members have the same valor: Initial legion subsystem specs leave the case of nodes with the same valor undefined. The new implementation will set a UUID (if uuid support is available) too. When 2 (ore more) nodes with the same valor pop-up the one with the higher UUID (using uuid_compare()) wins. |
Quorum: it is a pretty decent way to reduce split brain problems. Each legion will expect (by default) a quorum of 1, that means node will not vote for election of a lord. A quorum value higher than 1 will enable the vote mode. Each node will send an additional announce with the choosen lord. When a potential new lord receive at least Q (quorum) announces for itself it will became the new lord. |
Multiple lords ? Are multiple lords useful in some specific context ? |
Quorum You described means that I need to have: N >= Quorum nodes with ?? There is one great feature in uWSGI that I use - statelessness - only FastRouter nodes are "static", all backends can be moved around, added and removed without touching single line of configuration (and it should be easy to use multicast for subscriptions so I could have those made more dynamic as well). So I am not a fan of putting Maybe quorum could be implemented as the minimal number of joined nodes (like with cman and other tools)? So that if I use |
BTW - please add legion API so that plugins could interact with it easily. Right now if I don't want to reimplement (which is fancy term for copy&paste) all |
Quorum means that a lord is not elected until Q nodes agree on it. This is required for avoiding (reducing) split brain, and it will be optional (by default no quorum is needed). Regarding stateless the relevant "new part" is the one allowing you to NOT specify the valor of a node. The valor will be somewhat randomic so you do not need to worry about that, and you will be sure only one node per-legion will run "things". |
Regarding the api you only need to add the legion name in your options. In my mind a --legion-cron would be something like that: --legion-cron foobar -1 -1 -1 -1 -1 my_command that means, run the cron only if the node is the lord of the 'foobar' legion. In your plugin you will have something like that: run_cron(mycron) { |
Great, everything is clear now.
This is pretty much what I was toying with in https://github.com/prymitive/uwsgi/blob/clustered_crons/plugins/clustered_crons/clustered_crons.c |
Yes, i think you will be able to extremely simplify it as soon as the api is ready |
Could legion nodes also advertise ip:port of the uWSGI sockets they have to other nodes? This way new node would join legion cluster and once it's joined and the cluster is quorate, it could ask some other node (preferably master) for cache-sync. This would allow for cache syncing without the need to use |
this is part of the "legion scroll" system. You will find reference to it in the current code. Basically you can add a "blob" --legion-scroll = mylegion "foobar=test,youraddr=127.0.0.1:4040" it is still incomplete but will be ready for sure in time for 1.5 as it is the base for the upcoming clustered-emperor support |
Ok, scroll will be great addition. Could we also consider allowing user/plugin to provide some code that will dynamically alter nodes valor? Without the ability to alter the valor, we might end up with all tasks on single node. If those tasks are memory or cpu heavy than we might kill that node. |
altering the valor is pretty easy (all of the math is done every time from scratch in the legion subsystem so you can trick it in various way) something like uwsgi_legion_set/inc/dec_valor(struct uwsgi_legion *legion, uint64_t new_valor); should be good (and without the need of locking as we are changing a single numeric value) |
Is "legion scroll" going to be in 1.9 or later? Once scroll is ready and there is a way to sync uWSGI cache from legion master node, than we will have basic legion crons plugin. I'm not rushing or anything, I currently use redis script for locking jobs and I won't move from it anytime soon, but others might find that useful and start if so, we will get few extra legion testers ;) |
it should be (eventually will be in 1.9.1) |
I don't think there are any issues with current legion implementation, per run uid feature is present and working, scroll is ready and working. Closing as resolved |
There are 2 things that qualify for this issue that I would like to discuss, so I'll reopen it to have discussion in one place. I'm thinking about replacing keepalived with legion, for virtual ip management on 2 of my fastrouter nodes. I was thinking about what could go wrong and as always split brain can occur (as with all clustering tools and only 2 nodes), so maybe we could add legion node(s) that can't become The Lord and use quorum=2 (or bigger, with multiple arbiters). What this would give us - if one node can't communicate with the other currently there is no way to check if this is due to issue one local or remote node. But if local node can talk to arbiter nodes and quorum is reached, then we can make a decision who should become a lord. Second thing that could help - health checks - run something, if it fails lower the valor of local node or prevent it from becoming The Lord, example:
|
+1 for the arbiter, i will work on this soon after 1.9.7 Regarding the dynamic valor, i am thinking about changing it directly from the apps, so you can have a mule making all the checks you need without spawning a process every time. |
The support for arbiters has been added. Just add node with a valor of 0 (they will be reported in the logs as "arbiter" instead of "node") |
Changing valor for mules is ok, one can add checks there with uwsgi timer or cron API, but I think there is use case for more direct manipulation of nodes valor. Example: I need to do some maintenance on lord node and I would prefer to move the lord to another node before stopping lords uWSGI instance (in mongodb I would call I would need to talk to local instance, probably over uWSGI socket (?), it would then set valor to 30, that would trigger new election (or we need to force update in legion cluster). This would allow me to more gracefully stop lord node. |
We could also add something like stepDown() into uwsgi_reload(), when I need to stop the lord, I would like it to first pass its lord status to other node in as much graceful way as possible, so in case of virtual IP there is as little downtime as it can be. I'm not sure what happens when uWSGI instance joined to legion dies/reloads, does it send any message to other nodes that it should be marked as dead? Or does it dies and other nodes thinks it's up until legion-tolerance is exceeded? It looks like the second case, it has pros and cons: we can quickly reload instance without other nodes in the legion noticing it (in case valor is different and runtime uuids are not used for lord election) - in such case lord title stays on the reloaded node if node will not reload correctly (upgrade that went wrong) node will not get up and so if it was the lord we need to wait until other nodes will notice that it is down I think it's just a matter of (optionally) sending announce with valor = 0 when we need to reload, we can make it an option so user will pick desired behavior. (?) |
Ok, i think it is time to deal with it. What to do when an instance is reloaded (or stopped) ? i suppose we need to allow users to choose what to do for each legion. In some case it could be good to leave asap, in others it would be better to wait for the whole cluster to choose the best lord |
ok, i think we can close it. Latest commit send a "death announce" whenever the server exit or it is reloaded. There is no need to allow the user to choose it, as the uuid of the node is changed on startup, so effectively the old node is always lost |
The first purpose of the legion subsystem was ip-takeover. Infact the protocol is very similar to the carp one.
During the "future of the uWSGI clustering" discussions/rfc emerged that a better master ('lord' in legion subsystem gergo) election system is needed.
Inspirations are the paxos algorithm and the redis-sentinel implementation. Objectives are reducing split-brain cases and increase reliability.
The text was updated successfully, but these errors were encountered: