Skip to content

sshekhar1996/hpca-assignment-2023-master

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HPCA Assignment

In this assignment, your task is to optimize the problem of “Dilated Convolution (DC)” which involves an input matrix and a kernel matrix. The sample algorithm is depicted below using an animation.

Input:

A. Input Matrix of dimensions: $\text{Input-Row} \times \text{Input-Column}$.

B. Kernel Matrix of dimensions: $\text{Kernel-Row} \times \text{Kernel-Column}$.

Output:

An Output Matrix of dimensions: $(\text{Input-Row}-\text{Kernel-Row}+1) \times (\text{Input-Column} - \text{Kernel-Column} +1)$

Explanation and algorithm of DC:

Dilated Convolution (DL) is a variant of the convolution operation. Following is the explanation of the algorithm:

Let us assume for the explanation of the algorithm that the dimensions of the Input Matrix and Kernel Matrix are as follows:

A. Input Matrix $(I)$: $4\times4$

B. Kernel Matrix $(K)$: $2\times2$

So, dimensions of the Output Matrix $(O)$ shall be (based on calculation): $3\times3$.

[ $\text{Output-Row} = \text{Input-Row} - \text{Kernel-Row} + 1 = 4 - 2 + 1 = 3$. Calculation is similar for $\text{Output-Column}$.]

The Kernel Matrix $(K)$ slides over the Input Matrix $(I)$ and produces each cell of the Output Matrix $(O)$ as shown below. In the animation below, The Kernel Matrix $(K)$ is the left-most, the Input-Matrix $(I)$ is in the middle, and the Output Matrix $(O)$ is the right-most.

Algorithm Animation


Optimize dilated convolution using hardware counters.

Contained are three folders:

  • PartA: Contains setup for single-threaded and multi-threaded program.
  • PartB: Contains setup for GPU program.
  • test: Contains a test program to aid in understanding the algorithm.

Each of the folders PartA and PartB contain two sub-folders, a Makefile, and a main program.

  • Makefile: Contains commands necessary to compile, generate inputs, and run the program.
  • data folder: Contains program that generates input, and will contain input once generated.
  • header folder: Files containing the function that performs the operation. Modify the files in this folder.
  • main.cpp: Program that takes inputs and executes the functions. DO NOT MODIFY THIS.

Navigate to each folder to begin setting up the system. Inside each folder do the following:

Compiling and generating input

Use the following command to compile the programs and generate required input:

make

PART A

Running program

You can use make to run the executable with the following command:

make run

Alternatively, you can manually run the program for the different input sets using the following commands:

./dilated_conv -i data/4096.in -k data/13.in
./dilated_conv -i data/8192.in -k data/64.in
./dilated_conv -i data/16384.in -k data/3.in

Note: the -i flag is for the input matrix and -k flag is for the kernel matrix.

PART B

To compile the code for use on native GPU use the following command:

make server

For use with GPGPU-Sim, additional flags are required during compilation, which can be done with the following command:

make sim

You can use make to run the executable with the following command for native execution:

make run_server

When running on GPGPU-Sim, use the following command instead:

make run_sim

test

This directory contains a simple program to help you understand how the algorithm works. You can play around with the sizes of the Input-Matrix $(I)$ and Kernel-Matrix $(K)$ to get a flavor of how each cell of Output-Matrix $(O)$ is generated. A stripped output of test executable mentioned above for the example discussed is as follows:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 56.8%
  • C 19.9%
  • Cuda 13.4%
  • Makefile 9.9%