Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 4: Youssef Victor #29

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ cmake_minimum_required(VERSION 3.0)

project(cis565_rasterizer)

# Crucial magic for CUDA linking
find_package(Threads REQUIRED)
find_package(CUDA 8.0 REQUIRED)

set(CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake" ${CMAKE_MODULE_PATH})

# Set up include and lib paths
Expand Down Expand Up @@ -76,8 +80,6 @@ if (WIN32)
endif()

# CUDA linker options
find_package(Threads REQUIRED)
find_package(CUDA 8.0 REQUIRED)
set(CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE ON)
set(CUDA_SEPARABLE_COMPILATION ON)

Expand Down
169 changes: 149 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,149 @@
CUDA Rasterizer
===============

[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md)

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)

### (TODO: Your README)

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.


### Credits

* [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo)
* [glTF Sample Models](https://github.com/KhronosGroup/glTF/blob/master/sampleModels/README.md)
# **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4:**

# **CUDA Rasterizer**





Tested on: Windows 10, Intel Core i7-7700HQ CPU @ 2.80 GHz, 8GB RAM, NVidia GeForce GTX 1050

![Built](https://img.shields.io/appveyor/ci/gruntjs/grunt.svg) ![Issues](https://img.shields.io/github/issues-raw/badges/shields/website.svg) ![CUDA 8.0](https://img.shields.io/badge/CUDA-8.0-green.svg?style=flat) ![Platform](https://img.shields.io/badge/platform-Desktop-bcbcbc.svg) ![Developer](https://img.shields.io/badge/Developer-Youssef%20Victor-0f97ff.svg?style=flat)




- [Features](#features)

- [In-Depth](#indepth)

- [Time Analysis](#time_analysis)

- [Bloopers](#bloopers)




____________________________________________________



The goal of this project was to run an algorithm that clears out all zeros from an array on the GPU using CUDA. This parallel reduction is done using the scan algorithm that computes the exclusive prefix sum. I also implemented a parallelized Radix Sort using the exclusive prefix sum algorithm developed.



### Things Done

#### Core Features

- [x] Everything

#### What Makes Me Special

- [x] Perspective Correct Color Interpolation
- [x] Instancing
- [x] Super-Sampled Anti-Aliasing

![SSAA Cow Instanced](img/ssaa_instacow.gif)


### In-Depth:

#### Perspective Correct Color Interpolation

It is important to always correctly interpolate the color at each vertex, and so I did.

Here is what a triangle that is leaning back from the camera looks like without perspective correct interpolation and with (side-by-side):

![Without Perspective-Correct Interpolation](img/not-persp-correct.PNG) ![With Perspective-Correct Interpolation](img/persp-correct.PNG)

Here they are in a nice GIF format that shows them back to back. (As with most color-dense GIFs, the colors are reduced for recording purposes, the colors in real life are exactly as they in the pictures above)

![Back-To-Back](img/persp-correction.gif)

As you can see in this very quantized GIF, there is a lot more blue in the triangle without the interpolation because the Z depth is not correctly interpolated!

#### Instancing

For instancing I added a preprocessor macro, `num_instances`, that defines the number of times that you want a mesh to be instanced. I then have a hard-coded array of transformation matrices that represent the transformations of each instance.

In the vertex shader, I loop over each instance and then transform each vertex `num_instances` times to correspond to the appropriate instance transformation.

I then go through the rest of the primitive rasterization as normal, with `num_instances * numPrimitives` primitives instead of the usual `numPrimitives`.

Here is what it looks like with the cow instanced 9 times:

![instaced_cow_9](img/instanced_cow.gif)

and here is what it looks like with it instanced 27 times!!

![instaced_cow_27](img/instanced_cow_27.gif)

#### Super-Sampled Anti-Aliasing:

For the super-sampled anti-aliasing (SSAA), multiply the fragment buffer by the preprocessor macro `SSAA_RES` which defines the scaling that each axis (`width`, `height`) is scaled by.

Here are the results, with FPS listed in the window title.

##### AA 1x1 (No anti-aliasing)

![anti-aliasing](img/aa_1.PNG)

##### AA 2x2

![anti-aliasing](img/aa_2.PNG)

##### AA 4x4

![anti-aliasing](img/aa_4.PNG)

##### AA 8x8

![anti-aliasing](img/aa_8.PNG)


### Time Analysis

I have time analyses for two major aspects that I implemented AA and Instancing. Instancing scaled pretty much very nicely. Considering I was scaling by O(n^3) each time, the time almost doubled every time, so that's a linear increase in time, which actually means my instancing is actually < O(n). That is cool.

Here is a stacked graph also showing the absolute time (in ms):

![timed_insta](img/timed_inst.PNG)

As you can see, the time doubles even though I'm tripling the number of cows in each time step.

Here is that same graph showing just percentages

![timed_insta](img/timed_inst_100.PNG)

As you can see, because all the work is done int the vertex shader (vertex transform and assembly) stage, the time it takes to transform the vertices gradually becomes the bottleneck.

With AA, the time scaled up very evenly and the bottleneck here was rasterization of course. Here is an absolute time (in ms) stacked graph of that:

![timed_insta](img/timed_aa.PNG)

The time increases evenly as I double my n every time, as such this means my AA is actually O(n^2), which is what you'd expect since I am literally sampling n^2 every time. In the 100% stacked bar graph it is very clear how much the rasterization becomes a factor as we scale. (Also note that the vertex shader becomes more and more irrelevant)

![timed_insta](img/timed_aa_100.PNG)

### Bloopers

My best bloopers were mainly while creating the base render. I tried rendering using many debug views with varying levels of success. Here are some of my favorite.

##### "Neon Cow"

I was trying to rasterize the cow with normals showing, a depth buffer bug caused this beauty:

![neon_cow](img/neon_cow2.gif)

##### "The Cow Sees All"

I was trying to again modify the depth buffer, reversed the depth check and got this creepy situation where the cow follows you around if you move the camera in a certain angle:

![follow-cow](img/follow_cow.gif)

##### "Debug View: Eye Space Normal"

Here is a nice debug view of the cow with camera-space normals, the color segmentation is due to the GIF recording software reducing the color palette:

![irridescent](img/iridescent_cow.gif)
2 changes: 1 addition & 1 deletion gltfs/triangle/triangle.obj
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ mtllib triangle.mtl
o Cube
v 0.000000 0.000000 0.000000
v 0.500000 0.000000 0.000000
v 0.000000 1.000000 0.000000
v 0.000000 1.000000 0.00000
vn 0.0000 0.000000 1.000000
f 1//1 2//1 3//1
Binary file added img/aa_1.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/aa_2.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/aa_4.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/aa_8.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/follow_cow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/instanced_cow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/instanced_cow_27.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/iridescent_cow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/neon_cow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/neon_cow2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/no-persp-correct-titled.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/not-persp-correct.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/persp-correct-titled.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/persp-correct.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/persp-correction.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/perspcorretion.psd
Binary file not shown.
Binary file added img/single_instanced_cow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/ssaa_instacow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/timed_aa.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/timed_aa_100.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/timed_inst.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/timed_inst_100.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_61
)
Loading