Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: more advanced app clustering #28

Closed
prymitive opened this issue Oct 22, 2012 · 15 comments
Closed

RFC: more advanced app clustering #28

prymitive opened this issue Oct 22, 2012 · 15 comments

Comments

@prymitive
Copy link
Contributor

Most apps running production code are run on several nodes and there are many cases when some kind of coordination between those nodes is required (for example to ensure that only one app at a time does some specific work), most of the time this is done using database (by creating locks or writing some metadata). uWSGI can be used to run some code (like daemons or crons) but AFAIK it lacks any kind of cluster wide cooperation, if I want to run cron task in my cluster than should only run on single node at any given time, than I need to ensure than no other node will execute it, this can be achieved either by:

  • adding cron entry only to one node config - but in case of node failure I need to manually move this cron task to another node
  • using some wrapper that creates a lock and executes task, or skip execution if lock is already present - this provides HA but requires setting up additional clustering software (like corosync or similar tools to provide reliable locking, or just redis if high reliability isn't required)

I believable that we can add clustered and coordinated cron execution across cluster of nodes without too much work, pieces required:

  • cluster locking so that uWSGI can create lock across cluster, we can also call that remote locking and use redis with SETNX command to provide locks (or even maybe create locking subsystem with pluggable lockers: redis, mongodb, whatever), such locks could be added to API

  • with cluster wide locks we can promote one node to cluster master, in case of redis that would be the first node that got lock (locks should expire and master should update them and reset ttl before they will expire)

  • with one node being master we can coordinate what node will execute what task but now we need to have the list of all cron commands but since cron settings doesn't need to be the same on all nodes than we should be able to define cron tasks in few different ways:

    • --cron %(interval) %(cmd) - cron will only be executed on local node, it won't use cluster lock features
    • --cluster-cron %(interval) %(cmd) - cron task will be executed on any node that contains this option

    Every node should communicate with master and send him the list of defined --cluster-cron options.
    Master than would would setup schedules for next round of each cron executions, pseudo metadata:

    crons {
      'mycmd1': {last_run: 123456,  last_node: 192.168.1.10:3000, next_run: 12445, next_node: 192.168.1.111:3555, last_rcode: 0}
    }
    
  • now each node would read metadata from master every minute and check if it should run any command, master should provide desired cron distribution algorithm so that one node will not be hammered

  • beside cron tasks we could add cluster wide singleton lock for daemon execution, so we could run daemon that should only be executed on single node and if this node dies it should be restarted on another node (for example celery beat - periodic task scheduler for celery tasks)

This is general idea, it might not be the best one but it provides starting point. Questions to be answered first:

  • Should such functionality be included in uWSGI itself? Right now I'm using python script wrapper for my cron tasks, it uses redis to lock task execution to single node but I would like to move this to uWSGI itself.
  • I don't think that uWSGI should provide advanced cluster tools, I would opt for using remote databases like redis or mongodb for providing locks, storing metadata and all other cluster wide informations that should be stored reliably, is this right direction?

I wanted to discuss this before I start hacking any code, all comments are welcome.

@unbit
Copy link
Owner

unbit commented Oct 22, 2012

i am really interested in that, and i have a customer constantly asking me for such a feature. The only thing i am not sure about if it is better to follow a master-locker approach (like the one you described) or a fully distributed one, like paxos http://en.wikipedia.org/wiki/Paxos_%28computer_science%29

@edevil
Copy link
Contributor

edevil commented Oct 22, 2012

Why not use a Zookeeper ensemble for that? That's what I do.

I'm afraid someday uWSGI will become sentient with all this new functionality. :)

@prymitive
Copy link
Contributor Author

I'm not very familiar with ZooKeeper, can post more details on how do You use it for cron tasks?

@prymitive
Copy link
Contributor Author

Let me clarify what I would like uWSGI to handle:

  • provide runtime environment (env settings, namespace/chroot)
  • app wide resource management - I want to enforce limits for app as whole, including cron tasks (cgroup)
  • execution tracking - was cron executed successfully? maybe it died due to memory limits and OOM kill
  • execution scheduling - on which node should given cron task run next? We could take load, cpu usage, free memory (either system or in cgroup) into account for better node selection

What I would "outsource" to other tools:

  • cluster wide locking
  • advanced clustering with quorum (needed for locks)
  • shared data storage

Bonus features:

  • pausing ability to pause cron tasks (selected or all), useful for maintenance
  • storing execution history (return code, any stdout/err messages generated during cron task execution)
  • alerts on errors (using uWSGI alert plugins)

uWSGI provides single environment for both web apps and cron tasks those apps require to run, what we need is a way to make those cron tasks executed in a coordinated way across cluster of uWSGI nodes.

@edevil
Copy link
Contributor

edevil commented Oct 23, 2012

What you said you would "outsource" is what I recommend Zookeeper for. It's fault tolerant and provides CAS operations and shared storage. Perfect for locks, coordination or just shared configuration.

As for running cron jobs, I use cron. :) Of course, I have to embed functionality in these scripts to talk with Zookeeper and decide what to do.

I guess I just find it weird that what I consider an excellent web server/application server can also be an excellent tool for everything you describe. Seems like a whole different functionality. But I don't doubt that uWSGI does that job just as well. :)

@prymitive
Copy link
Contributor Author

Zookeeper does look like the right tool for this job, I'll look into into it.

@prymitive
Copy link
Contributor Author

It will require lot of work so I will aim for 1.6 with that, hopefully this will keep me from falling into winter sleep.

@unbit
Copy link
Owner

unbit commented Dec 23, 2012

Ok, i have started investigating better integration with various clustering infrastructures/products.

Currently the exposed api is the following (take it as a draft even if i have implemented all of them in my company private repository)

uwsgi.cluster_members() -> returns the list of cluster members
uwsgi.cluster_lord() -> returns boolean, True if the calling instance is the master/lord or whatever term it is used
uwsgi.cluster_quorum() -> returns boolean if (for backends supporting it) the cluster has the quorum
uwsgi.cluster_lock(resource) -> returns nothing, blocks on a clustered resource
uwsgi.cluster_unlock(resource) -> returns nothing, unlock a previously locked clustered resource

The only developed backend is for the redhat cluster suite (cman/dlm)

I plan to add zookeeper and pacemaker soon after. Both legion and cluster subsystem will be available as backend even if they need some kind of work as they could overlap.

Some other project to look at in this area ?

@prymitive
Copy link
Contributor Author

IMHO most people (including me) does not need anything more sophisticated than a redis server (and with redis-sentinel it should be easy to have decent HA in place now) and I think that the goal in the first place should be to create something that is both: easy to use and "good enough" for non-critical tasks.
AFAIK cman is limited to 16 nodes, other tools might have other limits, and they all add complexity that might not be compensated if all You need to do is running harmless cron tasks. In case of my apps nothing will break if something goes wrong and cron is re-executed to fast, so I would rather have less reliable solution that have to debug cman from time to time.

I would rather use legion and storage plugins to do this job. Legion can talk over multicast so there is no need to have a full list of nodes, no configuration You need to recreate every time You add or remove a node. And with storage plugin I can share some data between nodes (like last run time). Legion just needs few more touches like way of handling nodes with same valor.

@unbit
Copy link
Owner

unbit commented Dec 27, 2012

Let me show you an example of the proposed cluster api using redis:

uwsgi --cluster-engine redis --cluster 192.168.0.17:4000 --ini cluster://foobar.ini

that means:

join (see below) the cluster managed by the redis server 192.168.0.17 port 4000 and get instance configuration using the object foobar.ini (that in redis could be a simple key)

uWSGI configure its internals (redis-specific) and then gets the instance configurations and (if all goes well) starts accepting requests. An additional thread (in the master) starts waiting for redis event (read: redis-sentinel events) and checks for deadlocks (see below).

The app want to lock a shared resource and calls uwsgi.cluster_lock("item001") internally the request is mapped to
a series of redis calls (like described here: http://redis.io/commands/setnx). When finished it calls uwsgi.cluster_unlock("item001").

The address of the redis instance is in a shared memory area managed by the master-thread. If the sentinel notify of changed master the value is changed accordingly, and pending locks (or deadlocks) are managed (whenever a core is cluster locking something a global table is filled, so the master-thread can check it)

This is for the locking part. Regarding routers (fastrouter,httprouter...) when the instance joins the cluster an internal list is filled with the instance address and the router can use that list (asynchronously managed by the master-thread) for load balancing.

Regarding cron locking, i believe the best approach would be improving the legion subsystem, as for the nature of the system it is better to ensure a single node "hold" a resource (the cron task) indefinitely.

@prymitive
Copy link
Contributor Author

uwsgi --cluster-engine redis --cluster 192.168.0.17:4000 --ini cluster://foobar.ini

that means:

join (see below) the cluster managed by the redis server 192.168.0.17 port 4000 and get instance configuration using > the object foobar.ini (that in redis could be a simple key)

uWSGI configure its internals (redis-specific) and then gets the instance configurations and (if all goes well) starts accepting requests. An additional thread (in the master) starts waiting for redis event (read: redis-sentinel events) and checks for deadlocks (see below).

Isn't this emperor with redis backend? What advantages does it offer over emperor? The only difference would be reading node list from redis rather than from subscription table (?)

@unbit
Copy link
Owner

unbit commented Dec 27, 2012

configuration download is only to be backward-compatible with the old (current) clustering subsystem, i do not think there are cases where emperor is not the best choice. The same for the subscription system (even if storing nodes data in different storage could be interesting).

Main purpose (for me) of the new infrastructure will be node synchronization
(linux kernel dlm was the reason for working on redhat cluster suite before the others) and "revamping" the old cluster-messaging attempt (i think there is still documentation on the old site) that never worked as expected.

So, to sum up:

redhat (config, nodes, locking, messaging)
corosync (config, nodes, messaging)
zookeeper (config, locking)
redis (config, nodes, locking)
uwsgi-multicast/old clustering (config, nodes, messaging)

will be the backends exporting various features via the clustering api.

Currently i needed only the redhat plugin for a customer and the redis setnx support for another (this second one is still in development).

I suppose you are suggesting a more modular approach where we simply have "cluster-locking" plugins, "emperor-storage" and simply drop messaging (little use as no one asked for it in the past :P). Nodes infos are irrelevant in the choice as i think the subscription system is versatile enough to make everyone happy.

@prymitive
Copy link
Contributor Author

With legion subsystem and this toy plugin one can have such crons, but it needs more work to be really usable. This will probably not happen in 1.5 since there are other things cooking there.
We would still need persistent store, the simplest solution would be to use uWSGI cache with cache-store option, but to make it work with more than one node we would need a way to sync cache from other nodes on startup. This is already possible with cache-sync but You need to specify a working node, to make it really usable feature we would need a way to fetch cache data from oldest node connected in cluster, so first we need to get the list of nodes (from legion cluster or fastrouter?) and then pick the oldest one (since other nodes might be also syncing at the moment, unless syncing is blocking so only fully synced nodes are connected to cluster).

@prymitive
Copy link
Contributor Author

update: --legion-cron command was added by Roberto yesterday, it will run all cron tasks only on node which is lord for given legion (you can use multiple legions and spread cron tasks manually).

I need more features for crons so I'm working on plugin that will implement them (storing job history somewhere, spreading jobs across all nodes based on few factors and so).

@prymitive
Copy link
Contributor Author

Legion subsystem was added to 1.9 and in current master there is legion-cron, so this we have working solution for this issue, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants