This repository has been archived by the owner on May 7, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
78 lines (50 loc) · 3.07 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
Percolate is a simple-stupid application for combining command-line programs
into flexible, fault-tolerant data transformation workflows. Its goal is to
allow complex workflows to be expressed using only standard Ruby operators.
Percolate's features are:
- The ability to create complex, parallel workflows as plain Ruby code. The
workflow paths are implicit in the code, being defined by method arguments
and return values.
- Workflows may contain any combination of Ruby code, local system calls and
asynchronous batch queue jobs.
- Partially complete workflows may be suspended and continued later.
- Workflows may be restarted after failure without repeating successful steps.
Partially complete workflows may be paused and archived, to be continued later.
- Parallel workflows may be executed using fork/exec or by integration with
Platform LSF for large clusters.
- Small and lightweight. These things are relative, of course, but the entire
system is less than 2000 LOC, including the driver and auditor.
Percolate's restrictions are:
- Methods on the workflow execution path must adhere to a simple convention
of being able to accept arguments and return values of 'nil' for resources
that are unavailable at the time of invocation.
- Heavy compute should be done by the command-line programs called by the
workflow, not the workflow script itself.
To create and run a workflow, the steps are:
1. Use Percolate's helpers to wrap each command-line program in a Ruby method so
that the essential resources required for the run are represented by the
method arguments and the resources created by the run are represented by the
method return values. Choose whether to run synchronously (via system) or
asynchronously (via fork/exec or on a cluster via Platform LSF).
2. Write the body of the workflow using these methods and any Ruby flow control
operators. Within a single workflow, any combination of Ruby methods,
fork/exec jobs or Platform LSF jobs are permitted.
3. Create a Workflow class as an entry point, having a 'run' method that
invokes the workflow. Workflows may create and invoke more instances of any
Workflow class.
4. Start the Beanstalk message queue.
5. Launch workflows by placing a YAML file into the Percolate 'in' directory.
The file describes the Workflow class to instantiate and the arguments to
the 'run' method.
6. Run the Percolate driver repeately at intervals (e.g. via cron) until the
system moves your input YAML file to the Percolate 'pass' directory
or to the 'fail' directory (if one of the steps has failed).
7. If there was a failure, look at the logs, fix the problem and move the YAML
file back to the 'in' directory to resume the workflow.
8. Run the auditor on the log to see a breakdown of what happened during the run.
Percolate's dependencies are:
- Beanstalk (http://kr.github.com/beanstalkd/)
- The beanstalk-client Ruby gem (http://beanstalk.rubyforge.org/)
- The gibbler Ruby gem (http://github.com/delano/gibbler)
and optionally, for the auditor
- The Ruport Ruby gem (http://www.rubyreports.org/)