-
Notifications
You must be signed in to change notification settings - Fork 447
Sporadic Applications
BOINC was originally designed as a batch processing system: you submit jobs, they run (independently of each other) and eventually finish. But some potential uses of volunteer computing don't fit this model. They may require that apps run concurrently on different computers, and perhaps that they communicate directly with each other. Examples include MPI-type parallel computing and distributed machine learning. BOINC's 'sporadic application' mechanism is designed to support these types of systems, and to allow them to coexist with batch processing.
In this scheme there is a distributed system - let's call it a 'guest system' - that exists outside of BOINC. The guest system typically has its own server that handles requests and dispatches them to 'worker nodes' running the BOINC client. The guest system's worker part runs as a sporadic app on these nodes. Instances may communicate directly with each other - peer-to-peer or via a relay - as well as with the server.
The guest system uses BOINC to
- Securely distribute and run its worker code (the sporadic app).
- Enforce volunteer computing preferences (when to compute, how many CPUs to use, etc.)
- For volunteers already using BOINC, to divide computing power among projects.
The guest system doesn't use BOINC's batch processing features.
The jobs of a sporadic app run (i.e. are present in memory) all the time, like non-CPU-intensive jobs, but they compute only some of the time. Like regular apps, a sporadic app can have multiple app versions. Each of these has a plan class, which determines the processor usage (CPUs and GPUs) of its jobs. A project's BOINC scheduler can send multiple jobs for a given sporadic app, using the same or different app versions.
A BOINC project can provide any combination of regular, sporadic, and non-CPU-intensive apps. A client can be connected to multiple projects with sporadic apps.
Like regular jobs, a sporadic job can compute only when BOINC allows it to, i.e.:
- computing (and GPU computing if relevant) is enabled by user preferences
- there are sufficient processing resources and RAM
In addition, a sporadic job computes only when the guest system asks it to. Thus, a sporadic job converses with both the BOINC client and with the guest system server; it computes only when the server asks it to, and when the client says it's OK to.
The protocol between the BOINC client and a sporadic app uses the following messages:
Client to app:
DONT_COMPUTE
: the app can't compute now (e.g. because resources are not available)
COULD_COMPUTE
: the app could potentially compute
COMPUTING
: the app is computing as far as the client is concerned
App to client:
DONT_WANT_COMPUTE
: the app doesn't want to compute now
WANT_COMPUTE
: the app wants to compute
The protocol between the app and the guest server isn't specified. It could be based on polling from the app, or bidirectional requests.
A typical scenario is as follows:
sequenceDiagram
participant C as BOINC client
participant A as sporadic job
participant S as guest system server
A->>C: DONT_WANT_COMPUTE
A->>S: I cannot compute
C->>A: DONT_COMPUTE
C->>A: COULD_COMPUTE
A->>S: I can compute
S->>A: here is a request
A->>C: WANT_COMPUTE
C->>A: COMPUTING
A->>S: request confirmed, computing
loop Compute
A->>A: check for DONT_COMPUTE from client
end
A->>S: I am done computing
A->>C: DONT_WANT_COMPUTE
C->>A: COULD_COMPUTE
The steps are:
- Initially the client tells the app that it can't compute, perhaps because the user has suspended computation.
- The app relays this to the server; this tells the server not to send any requests. The server can keep track of which worker nodes are available for computing at a given point.
- Eventually the user enables computing;
the client relays this as a
COULD_COMPUTE
message to the app, and the app relays it to the server, indicating that it can now accept requests. - The server sends a request to the app, asking it to do some computing (and possibly some network communication with other workers).
- The app sends
WANT_COMPUTE
to the client. - The client reserves that needed computing resources
and sends
COMPUTING
to the app - The app computes. When it's done, it sends
DONT_WANT_COMPUTE
to the client. - The client (assuming computing is not suspended) sends
COULD_COMPUTE
It's also possible that the app must stop computing before the request is finished - for example, because the user suspends computing. In this case:
- The client sends
DONT_COMPUTE
to the app - The app notifies the server that it can't finish the request (or it might wait before doing this, in case computing is re-enabled quickly).
Thus, the app must continuously check for message from the client, even while it's computing.
The API interfaces for communicating with the client are:
enum SPORADIC_CA_STATE {
CA_NONE = 0,
CA_DONT_COMPUTE = 1,
CA_COULD_COMPUTE = 2,
CA_COMPUTING = 3
};
enum SPORADIC_AC_STATE {
AC_NONE = 0,
AC_DONT_WANT_COMPUTE = 1,
AC_WANT_COMPUTE = 2
};
extern void boinc_sporadic_set_ac_state(SPORADIC_AC_STATE);
extern SPORADIC_CA_STATE boinc_sporadic_get_ca_state();
Sporadic apps that do network communication should obey the rules for accessing the network. They should, for example, not communicate when user preferences forbid it.
You can run sporadic apps in VMs (using vboxwrapper)
or the BOINC wrapper.
In both cases the app communicates sporadic state using disk files
ca
(for client state) and ac
(for app state).
These files are located in the slot directory (wrapper)
or shared directory (vboxwrapper).
See the sample sporadic app for an example in C++; you can do the equivalent in Python or other languages.
To use this, pass the --sporadic
command-line argument
to wrapper or vboxwrapper.
In the initial implementation of sporadic apps (BOINC client version 7.26), sporadic apps have strict priority over regular apps. Thus if a sporadic app does lots of computing it can starve regular app. If multiple sporadic apps compete for a resource (say, a GPU) the prioritization among them is fixed; one can starve the others.
In a later version, sporadic apps will be scheduled using the same scheme that is used for regular apps, in which project resource share determines prioritization and starvation is eliminated.