Important: This library is currently in pre-release stage, so it might change and break (often). Feel free to use shepherdpy, but please understand the risks when you pull changes from the repo. I will release version 0.0.1 as soon as it is stable.
A companion library for mincemeatpy that will manage MapReduce clients.
You can get mincemeatpy from michaelfairley/mincemeatpy
shepherdpy makes it easy to manage mincemeatpy clients.
mincemeatpy is a small, lightweight MapReduce library for Python. It doesn't provide functionality for managing processes.
I wanted to give more flexibility to those using mincemeatpy for MapReduce by:
- making it easy to write and debug a server on a single machine.
- making it easy to control a pool of client processes on the same machine, or a different machine.
You only need to download mincemeat.py and shepherd.py to use the library.
This is the simplest example. It will use the identical dataset from the mincemeat.py example.
First, start the server:
python example.py
Second, start the client:
python shepherd.py
And the server will print out:
{'a': 2, 'on': 1, 'great': 1, 'Humpty': 3, 'again': 1, 'wall': 1, 'Dumpty': 2, 'men': 1, 'had': 1, 'all': 1, 'together': 1, "King's": 2, 'horses': 1, 'All': 1, "Couldn't": 1, 'fall': 1, 'and': 1, 'the': 2, 'put': 1, 'sat': 1}
Notice that shepherd.py supplies reasonable default values to mincemeat, so it is easier to start the mincemeat clients.
If you don't run
python example.py
before you run
python shepherd.py
then shepherd.py will quietly do nothing and exit.
You can use the -s option to start the clients before the server. The -s option defines the number of seconds each client will sleep between connection attempts. This allows you to run
python shepherd.py -s 1
then to run
python example.py
You can use the -8 flag to run the clients forever (or until ctrl+C) is pressed. First run
python shepherd.py -8
then run
python example.py
You can then run the following again
python example.py
Note the example.py server returns an result both times and shepherd.py continues running.
The -n option controls how many client processes will be created.
For example, run the server
python example.py
next start 2 clients
python shepherd.py -n 2
Note that for very short tasks, such as example.py, both processes may not be used.
- -v flag will use INFO logging level.
- -V flag will use DEBUG logging level.
You can run the unit tests for shepherd.py using the test_shepherd.py script.
python test_shepherd.py
In general, if you make a pull request for this repo: please add a unit test, and make sure it passes before submitting the pull request.
shepherd.py uses Travis CI for continuous integration, so you can see the latest status of the trunk by looking at the status image at the top of this README.
mincemeat.py 0.1.3 is included in this repo. It is for testing and demo purposes only. It is recommended you use the official mincemeat.py version from the repo.
shepherdpy is free software, licensed under the MIT license.