Name | Student Number | Website | |
---|---|---|---|
Jamie Counsell | 1054209 | counsej@mcmaster.ca | jamiecounsell.me |
James Priebe | 1135001 | priebejp@mcmaster.ca |
This C program renders a mandelbulb or mandelbox 3D fractal with optimizations in OpenACC.
pgcc
or compiler with support for OpenACC
- Install the dependencies
- Clone the repository
To run the program, first run make:
$ make clean
$ make [type]
Where type
is one of:
mandelbulb
- Compute a Mandelbulb fractalmandelbox
- Compute a Mandelbox fractalboxserial
- Compute a Mandelbox fractal using a serial implementationbulbserial
- Compute a Mandelbulb fractal using a serial implementation
Then run the command with the optional runtime flags:
$ ./mandel[box, bulb, bulb_serial, box_serial] params.dat [-f n] [-v]
where:
- params.dat is a file containing the Mandelbulb or Mandelbox parameters.
- f - Instruct the program to generate
n
frames. Default is 1 frame. - v - Instruct the program to generate a video when it is complete (calling
genvideo.sh
)
For example, to generate a 7200 frame video at 30FPS (4 minutes) of the mandelbulb:
$ ./mandelbulb params.dat -f 7200 -v
To generate the mandelbulb given in the assignment, one can use the command:
$ make clean; make mandelbulb
$ ./mandelbulb paramsBulb.dat
OR the serial version:
$ make clean; make bulbserial
$ ./bulbserial paramsBulb.dat
The resulting images will be in the frames directory as 00000.bmp
. The filename used in the parameters is not used here to follow convention and ensure this frame can be used in the video.
For the first frame of the submitted video, the following times were recorded:
Server | OpenACC | time |
---|---|---|
tesla | NO | 108.16456s |
tesla | YES | 1.236812s |
The server was under heavy load during testing, so future results may vary, but this shows a significant speedup (~87.5x faster with OpenACC than without). The same CPU was used to show speedups related purely to OpenACC acceleration.
The only region that was parallelized was the nested loop in renderer.cc
. This loop is the program's largest bottleneck and also supports parallelization quite intuitively. OpenACC pragmas were used to identify the region as an OpenACC compute region, as well as transfer the data to the device from the host. The outer loop was explicitly marked as parallel, and other optimizations were left up to PGCC.
Functions called inside the compute region were identified as ACC Routines, and any functions called within such routines were inlined, due to the issue mentioned in class, where variables seem to take on a NULL or somewhat undefined value when they are passed to a function called by a Routine, even if that function is also marked as a Routine.
The data structures were flattened to be more easily passed between methods. To accomodate for this, additional parameters were added to the ACC Routines called inside the compute region. All data (including the flattened parameter structures) was explicity copied to the device using data pragmas before the beginning of the compute region.
Early on, we faced an interesting problem (and a great example of the proper use of the present_or_copy[in, out]
ACC Methods. When image data (image
) was marked as copy
, copyin
, or copyout
, the compiler would generate code to reallocate image
on the device each time the compute region began. This is because it has no way to maintain state across compute regions, and is therefore unable to maintain the pointer to image
without explicitly checking if it is present first (then reuisng it). By adding the present_or
prefix, we were able to instruct the compiler to not reallocate the memory, and instead overwrite the existing memory allocated for image
. Since every pixel in image
is changed before it is copied out, there are no problems here with risk of using old data.
Since frame parameters were not generated asynchronously, no parallelization was done to compute more than one frame at a time.
Frames are generated sequentially from an array of CameraParams
structures. The first image generated is always the same as what is identified in the input parameters. This ensures that the assignment requirements can be properly met with the given paramsBulb.dat
file. After the first image, the camera rotates around the fractal, slowly decreasing its position in the z
axis from 1
to -1
across 7200 frames. The position is computed as follows:
- the
x
coordinate iscos(frame_number/500)
- the
y
coordinate issin(frame_number/500)
- the
z
coordinate is1 - (frame_number/3600)
This will guide the camera around the object in a circular motion along the x, y
plane such that it will complete one full rotation every 500*pi
frames. The z value decreases individually from 1
to -1
between frames 0
and 7200
, respectively. This creates a sort of spring path, showcasing all sides of the fractal.
Each iteration, init3D
is called again to ensure the camera is still facing the center point at (0,0,0)
.
With the exception of the first frame, each subsequent frame's parameters are generated during the previous frame's position in the loop. That is, the parameters for frame i
are computed before rendering frame i-1
. An array of CameraParams
structures are kept in order to keep track of current and previous configurations. One could add support for rendering multiple frames at once, since the configurations are all available in memory. This would be a reasonable next step, and a good use for something like OpenMP.
To compute the final result, the following configuration file (bulb_params.dat
) was used:
# CAMERA
# location x,y,z (7,7,7)
1 0 1
# look at x,y,z
0 0 0
# up vector x,y,z; (0, 1, 0)
0 0 1
# field of view (1)
2.0
# IMAGE
# width height
3840 2160
# detail level, the smaller the more detailed (-3.5)
-3.45
# MANDELBULB
# ignore the first number, 0.
# the second and third numbers are escape (or bailout) time and power
0 4.0 9.0
# ignore the second number; the first number is the max number of iterations
100 0
# COLORING
# type 0 or 1
0
# brightness
1.2
# IMAGE FILE NAME
imageBulb.bmp
The command used to generate the result is:
$ ./mandelbulb bulb_params.dat -f 7200 -v