You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When changing its application to use MPI derived datatypes, one of our major clients noticed an important performance degradation during the data exchanges.
One of our studends did his internship on this topic 1 year ago.
He noticed that the degradation was coming in part from indexed DTT packing/unpacking.
Using an intel compiler functionality, he could substancially reduce some indexed DTT exchanges: specifying a fixed size to the mempcy() makes the intel compiler inline the call as an assignment operation (given the size corresponds to the size of an int (4 bytes) or to the size of a double (8 bytes). Doing so, when an indexed DTT is made of a big number of MPI_DOUBLEs, for example, a series of assignments is done instead of a series of memcpy() operations. This implies a big performance improvement.
The proposed code is a bit hacky, but doing this we can divide the latency by up to 2 during DDT exechanges.
The patch is proposed on master, as usual, but it has only been tested on v3.x.
The following graph shows the latency improvement we could get for several kinds of transfers
0: latency exchanging column 0 of a 4K x 4K matrix of MPI_DOUBLES
1: latency exchanging column 2047 of a 4K x 4K matrix of MPI_DOUBLES
2: latency exchanging column 4095 of a 4K x 4K matrix of MPI_DOUBLES
3: latency exchanging diagonal 0 of a 4K x 4K matrix of MPI_DOUBLES
4: latency exchanging diagonal 2047 of a 4K x 4K matrix of MPI_DOUBLES
5: latency exchanging diagonal 4095 of a 4K x 4K matrix of MPI_DOUBLES
6: latency exchanging diagonal -2047 of a 4K x 4K matrix of MPI_DOUBLES
7: latency exchanging diagonal -4095 of a 4K x 4K matrix of MPI_DOUBLES
diag0: form top-left corner to bottom-right cirner (matrix[4095][4095])
0 diag number = towards up - right
< 0 diag number = towards down - left
In this graph, you will notice a third series of measurements (ompi new_assign): it corresponds to @ggouaillardet proposal to change some memcpy() operations to assignements #6617.
The text was updated successfully, but these errors were encountered:
When changing its application to use MPI derived datatypes, one of our major clients noticed an important performance degradation during the data exchanges.
One of our studends did his internship on this topic 1 year ago.
He noticed that the degradation was coming in part from indexed DTT packing/unpacking.
Using an intel compiler functionality, he could substancially reduce some indexed DTT exchanges: specifying a fixed size to the mempcy() makes the intel compiler inline the call as an assignment operation (given the size corresponds to the size of an int (4 bytes) or to the size of a double (8 bytes). Doing so, when an indexed DTT is made of a big number of MPI_DOUBLEs, for example, a series of assignments is done instead of a series of memcpy() operations. This implies a big performance improvement.
The proposed code is a bit hacky, but doing this we can divide the latency by up to 2 during DDT exechanges.
The patch is proposed on master, as usual, but it has only been tested on v3.x.
The following graph shows the latency improvement we could get for several kinds of transfers
0: latency exchanging column 0 of a 4K x 4K matrix of MPI_DOUBLES
1: latency exchanging column 2047 of a 4K x 4K matrix of MPI_DOUBLES
2: latency exchanging column 4095 of a 4K x 4K matrix of MPI_DOUBLES
3: latency exchanging diagonal 0 of a 4K x 4K matrix of MPI_DOUBLES
4: latency exchanging diagonal 2047 of a 4K x 4K matrix of MPI_DOUBLES
5: latency exchanging diagonal 4095 of a 4K x 4K matrix of MPI_DOUBLES
6: latency exchanging diagonal -2047 of a 4K x 4K matrix of MPI_DOUBLES
7: latency exchanging diagonal -4095 of a 4K x 4K matrix of MPI_DOUBLES
diag0: form top-left corner to bottom-right cirner (matrix[4095][4095])
In this graph, you will notice a third series of measurements (ompi new_assign): it corresponds to @ggouaillardet proposal to change some memcpy() operations to assignements #6617.
The text was updated successfully, but these errors were encountered: