Skip to content

Intro to Asynchronous Computing

rshipley160 edited this page Mar 25, 2022 · 7 revisions

Previous: Matrix Multiplication

So far, all of the GPU kernels we have run have been synchronous to one another, meaning that if we were to launch three kernels in a row, they would all be completed sequentially, one after the other until the entire set of tasks that have been sent to the GPU have been completed.

Asynchronous programs do not have such restrictions. When we launch kernels asynchronously, they do not have to wait on one another to complete and can run in parallel with one another. Another very effective use of concurrent GPU usage is asynchronous memory transfer, which is covered in more detail in a later article.

You can see this very clearly in the image above: in the synchronous diagram (left) the GPU completes both kernels sequentially, while in the asynchronous diagram (right) the GPU work is split between two separate streams of execution, and the program can complete much sooner as a result.

Risks of Asynchronous Execution

One significant disadvantage of using asynchronous methods is that the possibility of a race condition is introduced. Race conditions occur when two or more processes are operating in parallel on a shared resource and the order in which the processes affect the resource can vary between runs of the program, meaning that there is no way to know in advance the order in which the resource could be modified.
Specifically, race conditions on shared memory give rise to data hazards which are risks that data will be accessed improperly with respect to how it was intended to be accessed and can produce incorrect values as a result. The three data hazards are:

  • Write after write, meaning that a single value can be updated in the wrong order (i.e. task 1 updates memory after task 2 when it's supposed to be before)
  • Write after read, meaning that a value was supposed to be read after a write, but happens before it instead
  • Read after write, meaning that a read was supposed to happen before an update, but happens after

To avoid race conditions and data hazards, asynchronous processes can be synchronized with one another at certain points in a program to ensure accesses to shared resources happen in the correct order.

Next: CUDA Streams