Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Fix minor formatting issues #2

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/01_hello_world/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Setting up our Python environment

The first part of this workshop is focussed on distributed memory parallelism with MPI, making use of the Python programming language. There are many different interfaces to MPI for many different languages, so we've chosen Python for the benefits it provides to write examples in an easy-to-understand format. Whilst the specific syntax of the commands learned in this part of the course wont be applicable across different languages, the overall code structures and concepts are highly transferrable, so once you have a solid grasp of the fundamentals of MPI you should be able to take thoses concepts to any language with an MPI interface and write parallel code!
The first part of this workshop is focussed on distributed memory parallelism with MPI, making use of the Python programming language. There are many different interfaces to MPI for many different languages, so we've chosen Python for the benefits it provides to write examples in an easy-to-understand format. Whilst the specific syntax of the commands learned in this part of the course wont be applicable across different languages, the overall code structures and concepts are highly transferable, so once you have a solid grasp of the fundamentals of MPI you should be able to take those concepts to any language with an MPI interface and write parallel code!

First, let's clone this repository
```
Expand Down
9 changes: 5 additions & 4 deletions python/02_simple_comms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,25 +41,26 @@ print(f"{msg} from rank {comm.Get_rank()}")
```

Now, if we run this script in parallel we no longer get the error, because the variable now exists on the second rank thanks to the `send`/`recv` methods.
In order to add an additional layer of safety to this process, we can add a tag to the message. This is an integer ID which ensures that the message is being recieved is being correctly used by the recieving process. This can be simply achieved by modifying the code to match the following:
In order to add an additional layer of safety to this process, we can add a tag to the message. This is an integer ID which ensures that the message is being received is being correctly used by the receiving process. This can be simply achieved by modifying the code to match the following:

```python
comm.send(var, dest=1, tag=23)
...
var = comm.recv(source=0, tag=23)
```

The types of communications provided by the `send```/```recv` methods are known as blocking communications, as there is a chance that the send process won't return until it gets a signal that the data has been recieved successfully. This means that sending large amounts of data between processes can result in significant stoppages to the program. In practice, the standard for this is not implemented uniformly, so the blocking/non-blocking nature of the communication can be dynamic or depend on the size of the message being passed.
The types of communications provided by the `send```/```recv` methods are known as blocking communications, as there is a chance that the send process won't return until it gets a signal that the data has been received successfully. This means that sending large amounts of data between processes can result in significant stoppages to the program. In practice, the standard for this is not implemented uniformly, so the blocking/non-blocking nature of the communication can be dynamic or depend on the size of the message being passed.
Before we start the next example, we can add the line `comm.barrier()` in our Python script to make sure that our processes only proceed once all other processes have reached this point, which will stop us getting confused about the output of our program.

## Non-blocking communications

In some instances, it might make sense for communications to only be non-blocking, which will enable the sending rank to continue with its process without needing to wait for confirmation of a potentially large message to be recieved. In this case, we can use the explicitly non-blocking methods, `isend` and `irecv`.
In some instances, it might make sense for communications to only be non-blocking, which will enable the sending rank to continue with its process without needing to wait for confirmation of a potentially large message to be received. In this case, we can use the explicitly non-blocking methods, `isend` and `irecv`.
The syntax is very similar for the sending process:
```python
comm.send(var, dest=1, tag=23)
```
but the recieving process has more to unpack. The `comm.irecv` method returns a request object, which can be unpacked with the `wait` method which then returns the data:

but the receiving process has more to unpack. The `comm.irecv` method returns a request object, which can be unpacked with the `wait` method which then returns the data:

```python
if comm.Get_rank() == 0:
Expand Down
2 changes: 1 addition & 1 deletion python/03_collective_comms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ if not data is None:

## Global MPI operations

For distributed memory problems, its difficult to get a holistic view of your entire data set as it doesnt exist in any one place. This means that performing global operations such as calculating the sum or product of a distributed data set also requires MPI. Fortunately, MPI has several functions that make this easier. Lets create a large set of data and scatter it across our processes, as before:
For distributed memory problems, it's difficult to get a holistic view of your entire data set as it doesn't exist in any one place. This means that performing global operations such as calculating the sum or product of a distributed data set also requires MPI. Fortunately, MPI has several functions that make this easier. Lets create a large set of data and scatter it across our processes, as before:

```python
if comm.Get_rank() == 0:
Expand Down
14 changes: 8 additions & 6 deletions python/04_parallel_fractal/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Solving a problem in parallel

In the previous three sections we have built up a foundation enough to be able to tackle a simple problem in parallel. In this case, the problem we will attempt to solve is constructing a fractal. This kind of problem is often known as "embarassingly parallel" meaning that each element of the result has no dependency on any of the other elements, meaning that we can solve this problem in parallel without too much difficulty. Let's get started by creating a new script - `parallel_fractal.py```:
In the previous three sections we have built up a foundation enough to be able to tackle a simple problem in parallel. In this case, the problem we will attempt to solve is constructing a fractal. This kind of problem is often known as "embarrassingly parallel" meaning that each element of the result has no dependency on any of the other elements, meaning that we can solve this problem in parallel without too much difficulty. Let's get started by creating a new script - `parallel_fractal.py`:

### Setting up our problem

Expand Down Expand Up @@ -41,7 +41,7 @@ def julia_set(grid):
return fractal
```

This function calculates how many iterations it takes for each element in the complex grid to reach infinity (if ever) when operated on with the equation `x = x**2 + c```. The function itself is not the focus of this exercise as much as it is a way to make the computer perform some work! Let's use these functions to set up our problem in serial, without any parallelism:
This function calculates how many iterations it takes for each element in the complex grid to reach infinity (if ever) when operated on with the equation `x = x**2 + c`. The function itself is not the focus of this exercise as much as it is a way to make the computer perform some work! Let's use these functions to set up our problem in serial, without any parallelism:

```python

Expand All @@ -55,22 +55,24 @@ grid = complex_grid(extent, cells)
fractal = julia_set(grid, 80, c)
```

If we run the python script (```python fractal.py```) it takes a few seconds to complete (this will vary depending on your machine), so we can already see that we are making our computer work reasonably hard with just a few lines of code. If we use the `time` command we can get a simple overview of how much time and resource are being used:
If we run the python script (`python fractal.py`) it takes a few seconds to complete (this will vary depending on your machine), so we can already see that we are making our computer work reasonably hard with just a few lines of code. If we use the `time` command we can get a simple overview of how much time and resource are being used:

```
$ time python parallel_fractal_complete.py
python parallel_fractal_complete.py 5.96s user 3.37s system 123% cpu 7.558 total
```

> **Note:** We can also visualise the Julia set with the code snippet:
> `
>
> ```
> import matplotlib.pyplot as plt
>
> ...
>
> plt.imshow(fractal, extent=[-extent, extent, -extent, extent], aspect='equal')
> plt.show()
> `
> ```
>
> but doing so will impact the numbers returned when we time our function, so it's important to remember this before trying to measure how long the function takes.

### Parallelising our problem
Expand Down Expand Up @@ -110,5 +112,5 @@ $ time mpirun -n 4 python parallel_fractal.py
mpirun -n 4 python parallel_fractal.py 37.23s user 21.70s system 370% cpu 15.895 total
```

We can see that running the problem in parallel has greatly increased the speed of the function, but that the speed increase is directly proportional to the resource we are using (i.e. using 4 cores doesnt make the process 4 times faster). This is due to the increased overhead induced by MPI communication procedures, which can be quite expensive (as metioned in previous chapters).
We can see that running the problem in parallel has greatly increased the speed of the function, but that the speed increase is directly proportional to the resource we are using (i.e. using 4 cores doesn't make the process 4 times faster). This is due to the increased overhead induced by MPI communication procedures, which can be quite expensive (as mentioned in previous chapters).
The way that a program performance changes based on the number of processes it runs on is often referred to as its "scaling behaviour". Determining how your problem scales across multiple processes is a useful exercise and is helpful when it comes to porting your code to a larger scale HPC machine.