Skip to content

datadistribution

Thomas Herault edited this page Feb 7, 2017 · 5 revisions

Table of Contents

How to Write a program that uses a PaRSEC-Enabled Operation

This section is written using the very simple Round Time Trip benchmark, located in . Using the JDF Translator tool (parsecpp, see end of How to write a PaRSEC enabled operation), you have built a parsec object generator function. This function comes into two files: will generate and .

Overlook

If you look at , you will find the following:

 is the generator function, ,

the cleanup function. In the structure, you can find the data , under the name , that is defined in the code of , the globals and , under the names and that are also defined in }, and fields and variables related to /arenas/ (the

 array of the structure, the  counter,

and the defines and ).

In order to move data between nodes during the execution, PARSEC needs to know what is the shape of the data, how it is distributed among the nodes, and will need to allocate temporary memory space to store such data elements when moving them along.

Data shape (the fact that a data element is for example a NT x MT matrix of doubles) is defined using the powerful MPI Datatype representation. Each data type must thus be assigned an MPI data type. Arenas are the objects internally managed by PaRSEC to represent these types in a more abstract way. Notably, arenas define how to allocate temporary memory for storing such data elements.

User data fields () and corresponding parameters to the

 function are of type .

These objects represent data distributions. Each data that is referenced in a JDF is thus a , or a pointer to a structure that begins with a . We will see later how the data distribution is expressed using these objects.

Wrapper

The first thing that we advise you do is to create a set of wrapper for these new / destroy functions to fillup the missing elements and cleanup the invokation. That is what the files ,

 do:

If using /MPI/, must be included. defines a macro

 if and only if parsec has been compiled with /MPI/, so this

macro can be tested to see whether it is needed to include or not. We also define a static to hold the /MPI/ definition of the datatype PaRSEC is going to use to move the data along.

The wrapper begins by doing some safety checking, then it calls the generator function (whose definition is in the generated file ), o create a ping pong DAG that will iterate on times.

Then, this is when the preparation should be completed, to construct the arena that will potentially allocate temporary data. There is one arena per data type that was expressed in the JDF, and a /default/ arena. If all flows are typed, the default arena does not need to be constructured. An arena is defined by the number of bytes to store elements ( characters in this case), the alignment of the data chunks (this can be necessary when using GPU, to receive data in memory that is pinable for example), and the /MPI/ datatype used to transfer such data. In this case, we create a vector type that holds one block of bytes aligned together.

On a shared memory system, where /MPI/ is not needed, it is not necessary to construct the arenas.

The destructor simply releases the MPI datatype and destroys the PaRSEC handle.

Data Distribution

Now, the next step is to write a set of functions to access the data and describe its distribution to PaRSEC. This is done in the file , that is explained below:

The internal representation of the data in the JDF is encapsulated in the structure. It must derive the initial datatype . This data distribution descriptor will hold a single data block per /MPI/ process in , of size

 bytes. However, PaRSEC does not manipulate directly the user's

memory (which is represented here by ), but meta-data, under the form of , and . A

 represents a user data at the abstract level, and the
 class instanciates at most one per /MPI/ process per user

data element. A represents a copy, in a specific revision, of the user data, and multiple instances can cohexist. Distributed Data Descriptors usually manipulate objects, and provide objects for some interfaces, using PaRSEC functions to instanciate them.

PaRSEC will use all fields of the object, and it is the user responsibility to fill them. and are (respectfully) the rank of the local /MPI/ process and the number of nodes in the /MPI/ Communicator. Then, the other obligatory fields are function pointers to locate which rank holds which data, on what Virtual Process (domain) this data must be bound, where the data is for the local MPI process, how to compute a unique id for the data from its multidimensional index, and if tracing capabilities are enabled, a function to identify uniquely the data element. Those are given by (respectfully) the functions /, /, /, and . The profiling system, if enabled, uses and

 to provide a user-readable representation of the data.
 is expected to provide the range of the multidimensional

index of the data, while provides how to prefix this data (those are usually set in the user' program).

These fields are necessary to define the parent object. The structure only adds to it a placeholder to store the base pointer of the local part of the data. This could be optional, since in the RTT benchmark, only one data is used. But it is a good practice to have it relative to the descriptor object, to facilitate reuse and multiple instanciations of the same PaRSEC object.

The RTT benchmark will initially take the data from the rank 0, and move it to the next placeholder until it reaches the end. Thus, each rank needs only one data holder of size , and that is why we allocate only one. A more typical data descriptor would use the index to define each local element.

The code for the access functions is given below:

/ returns the rank of a given data element, based on its multidimensional index or its flat key. These functions must return the same value on any /MPI/ process that call them, even if the data location is remote.

/ returns the identifier of the Virtual Process that "owns" a data identified by its multidimensional index or its flat key. These functions must be defined only on the rank that hosts the data. The VPID is a number between 0 and .

/ returns the associated with the user data at this multidimensional index position, or for this flat key. These functions must be defined only on the rank that hosts the data. The best way is to use on a persistent holder: this function will create the data and assign it to for that , , and unless the holder already stores a . This function is atomic and can be called by multiple threads.

 is used to convert a multidimensional

index identifier into a flat key.

When in the JDF the user wrote to say that this task was going to run at the same place as the data , PaRSEC will call (which is the multidimensional index identifier) to compute what is the rank that holds the data. Here, the code places this data on rank , which is defined between 0 and , implementing a simple distribution of the data. The parameters of these functions must be variadic arguments to allow the user to define data on as many dimensions as wanted, and the user must be careful to access the right number of arguments using the va_arg macros.

When the user says that in a given flow, data comes from or goes to , PaRSEC will call the , or the

 function to obtain the

corresponding .

The last part of this file consists in releasing allocated resources.

The user is encourage to re-use the data distributions that have already been coded in the data_dist/ subdirectory of the parsec source code, instead of creating her own for every occasion. However, this might be necessary for specific purposes.

= Main Programm

Once this is completed, we can easily code the main RTT benchmark:

First, the MPI level must be initialized if necessary. Then, parsec must also be initialized to create the parsec_context_t, and specify the number of cores that one wants to use for parsec-related operations.

Then, the data must be created and distributed. The user is encouraged to re-use the data distributions that have already been coded in parsec/data_distribution, or create her own ad-hoc data distributions, as is the case for the rtt benchmark.

Data keys can be put to help profiling the execution of the PaRSEC operation.





Then, the DAG generator is instanciated, with the corresponding data description and additional parameters. At this time, PaRSEC computes the initial tasks for this DAG with these parameters, and the number of tasks to run locally.

It is enqueued to the parsec context.

The PaRSEC context is started (computing threads exit their passive waiting loop, and start computing the rtt), and the main thread joins them in . This blocks until the PaRSEC enabled operations that have been enqueued in the PaRSEC context complete.

When this is done, the parsec engine can be release, the data descriptors freed (if the PaRSEC enabled operation had a side effect on the data, which is usually the case, results should be read in the data before freeing the data descriptors), and finally the MPI level can be finalized.