Skip to content

Commit a8478a7

Browse files
ovdiiuvMax Andriychuk
andauthored
Add Max's introductory blogpost (#319)
* Add Max's introductory blogpost * Fix spelling --------- Co-authored-by: Max Andriychuk <valerii@mac.speedport.ip>
1 parent 0024ae4 commit a8478a7

File tree

2 files changed

+40
-0
lines changed

2 files changed

+40
-0
lines changed

.github/actions/spelling/allow/terms.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ ICHEP
1717
IIT
1818
JIT'd
1919
Jacobians
20+
JMU
2021
Jurgaityt
2122
LHC
2223
LLMs
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: "Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels"
3+
layout: post
4+
excerpt: "A GSoC 2025 contributor project aiming to implement Activity Analysis for (CUDA) GPU kernels"
5+
sitemap: false
6+
author: Maksym Andriichuk
7+
permalink: blogs/2025_maksym_andriichuk_introduction_blog/
8+
banner_image: /images/blog/gsoc-banner.png
9+
date: 2025-07-14
10+
tags: gsoc c++ clang root auto-differentiation
11+
---
12+
13+
### Introduction
14+
Hi! I’m Maksym Andriichuk, a third-year student of JMU Wuerzburg studying Mathematics. I am exited to be a part of Clad team fo this year's Google Summer of Code.
15+
16+
### Project description
17+
My project focuses on removing atomic operations when differentiating CUDA kernels. When accessing gpu global memory inside of a gradient of a kernel data races inevitably occur and atomic operation are used instead, due to how reverse mode differentiation works in Clad. However, in some cases we can guarantee that no data race occur which enables us to drop atomic operations and drastically speeds the execution time of the gradient.
18+
19+
### Project goals
20+
The main goals of this project are:
21+
22+
- Implement a mechanism to check whether data races occur in various scenarios.
23+
24+
- Compare Clad with other tools on benchmarks including RSBench and LULESH.
25+
26+
### Implementation strategy
27+
- Solve minor CUDA-related issues to get familiar with the codebase.
28+
29+
- Implement series of visitors to distinguish between different types of scenarios where atomic operations could be dropped
30+
31+
- Use the existing benchmarks to compare the speedup from the implemented analysis.
32+
33+
## Conclusion
34+
35+
By integrating an analysis for (CUDA) GPU kernels we aim to speedup the execution of the gradient by removing atomic operation where possible. To declare success, we would compare Clad to the other AD tools using different benchmarks. I am exited to be a part of the Clad team this summer and can not wait to share my progress.
36+
37+
### Related Links
38+
39+
- [My GitHub profile]https://github.com/ovdiiuv

0 commit comments

Comments
 (0)