-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.xml
482 lines (390 loc) · 43 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Yac's Log</title>
<link>https://yuang-chen.github.io/</link>
<description>Recent content on Yac's Log</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Thu, 12 Sep 2024 18:49:29 +0800</lastBuildDate><atom:link href="https://yuang-chen.github.io/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Performance Comparisons: Half, Half2 and Float</title>
<link>https://yuang-chen.github.io/posts/2024-09-12-half-half2-float/</link>
<pubDate>Thu, 12 Sep 2024 18:49:29 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2024-09-12-half-half2-float/</guid>
<description>A performance evaluation is conducted on an Nvidia L40, comparing the 100-iteration access times of device vectors with half, half2, and float types. Each vector was initialized with 1024*1024 elements, but for the half2 type, two elements were packed into a single vector entry. Hence, two randomness are tested for half2 type: random access per half2 and random access per half.
Access Type Data Type Vector Size Allocated Memory Time (ms) Random half 1M 2MB 4.</description>
</item>
<item>
<title>Constrained Non Type Template Parameter</title>
<link>https://yuang-chen.github.io/posts/2024-06-17-constrained-non-type-template-parameter/</link>
<pubDate>Mon, 17 Jun 2024 09:30:14 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2024-06-17-constrained-non-type-template-parameter/</guid>
<description> NTTP (C++98): Allows templates to accept non-type parameters like integers or pointers, enhancing flexibility and efficiency. CNTTP (C++20): Extends NTTP by using concepts to constrain non-type parameters, improving type safety and expressiveness. Code Example #include &lt;concepts&gt; #include &lt;cstddef&gt; // Function using NTTP template&lt;size_t i&gt; // size_t is unsigned, so negative values will cause an error auto get_value_nttp() { return i; } // Function using CNTTP template&lt;std::integral auto I&gt; // constrained to integral types auto get_value_cnttp() { return I; } int main() { // NTTP example auto x = get_value_nttp&lt;10&gt;(); // correct, 10 is a valid size_t // auto y = get_value_nttp&lt;-10&gt;(); // error, -10 is not a valid size_t (uncomment to see the error) // CNTTP example auto w = get_value_cnttp&lt;10&gt;(); // correct, 10 is an integral type auto z = get_value_cnttp&lt;-10&gt;(); // correct, -10 is an integral type return 0; } </description>
</item>
<item>
<title>Class Template Argument Deduction</title>
<link>https://yuang-chen.github.io/posts/2024-05-07-class-template-argument-deduction/</link>
<pubDate>Tue, 07 May 2024 09:05:16 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2024-05-07-class-template-argument-deduction/</guid>
<description>Class Template Argument Deduction (CTAD) is a feature introduced in C++17 that allows the compiler to deduce the template arguments for class templates from the constructor arguments. This makes code more concise and avoids the need for explicit template arguments.
Example without CTAD: #include &lt;vector&gt; #include &lt;iostream&gt; int main() { std::vector&lt;int&gt; vec = {1, 2, 3, 4, 5}; // Explicit template argument for (const auto&amp; elem : vec) { std::cout &lt;&lt; elem &lt;&lt; &#34; &#34;; } return 0; } Example with CTAD: #include &lt;vector&gt; #include &lt;iostream&gt; int main() { std::vector vec1 = {1, 2, 3, 4, 5}; // CTAD deduces std::vector&lt;int&gt; std::vector vec2 = {1.</description>
</item>
<item>
<title>Approximate Densest Subgraph</title>
<link>https://yuang-chen.github.io/posts/2024-01-26-approximate-densest-subgraph/</link>
<pubDate>Fri, 26 Jan 2024 11:36:24 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2024-01-26-approximate-densest-subgraph/</guid>
<description>Note The Approximate Densest Subgraph problem involves finding a subgraph of a given graph that has the highest density, where density is typically defined as the number of edges divided by the number of vertices in the subgraph. Finding the exact densest subgraph is computationally expensive, so approximate solutions are often sought.
Here&rsquo;s a high-level outline of how an approximate algorithm for this problem might be implemented:
Initialization: Start with all vertices of the graph and no edges.</description>
</item>
<item>
<title>Non-Virtual Polymorphism</title>
<link>https://yuang-chen.github.io/posts/2024-01-24-non-virtual-polymorphism/</link>
<pubDate>Wed, 24 Jan 2024 09:33:57 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2024-01-24-non-virtual-polymorphism/</guid>
<description>Modern Features in C++17 Non-virtual runtime polymorphism can be achieved with modern C++ (e.g., C++17) features std::any and std::variant as described in the table below.
Notice std::tuple is not used for polymorphism; it offers a structured way to manage multiple values of different types simultaneously, such as in function return types, or parameter packs. It is put here because of its usage is a bit similar to std::any and std::variant.</description>
</item>
<item>
<title>Tensor Core Register Layout</title>
<link>https://yuang-chen.github.io/posts/2024-01-21-tensor-core-register-layout/</link>
<pubDate>Sun, 21 Jan 2024 16:54:09 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2024-01-21-tensor-core-register-layout/</guid>
<description>Layout 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 16 17 18 19 20 21 22 23 0 0 0 0 0 0 0 0 24 25 26 27 28 29 30 31 0 0 0 0 0 0 0 0 32 33 34 35 36 37 38 39 0 0 0 0 0 0 0 0 40 41 42 43 44 45 46 47 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 0 0 0 0 0 0 0 0 56 57 58 59 60 61 62 63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 16 24 32 40 48 56 0 0 0 0 0 0 0 0 1 9 17 25 33 41 49 57 0 0 0 0 0 0 0 0 2 10 18 26 34 42 50 58 0 0 0 0 0 0 0 0 3 11 19 27 35 43 51 59 0 0 0 0 0 0 0 0 4 12 20 28 36 44 52 60 0 0 0 0 0 0 0 0 5 13 21 29 37 45 53 61 0 0 0 0 0 0 0 0 6 14 22 30 38 46 54 62 0 0 0 0 0 0 0 0 7 15 23 31 39 47 55 63 Code on V100 int half_elements = a_frag.</description>
</item>
<item>
<title>Maximal Independent Set</title>
<link>https://yuang-chen.github.io/posts/2023-12-13-maximal-independent-set/</link>
<pubDate>Wed, 13 Dec 2023 11:00:10 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-12-13-maximal-independent-set/</guid>
<description>Note An independent set in a graph is a set of vertices, no two of which are adjacent. A maximal independent set is an independent set that is not a subset of any other independent set in the graph. Here&rsquo;s a basic approach to find a Maximal Independent Set:
Start with an empty set S. Iterate over all vertices of the graph. For each vertex: If the vertex and its neighbors are not in S, add the vertex to S.</description>
</item>
<item>
<title>Maximal Matching</title>
<link>https://yuang-chen.github.io/posts/2023-12-05-maximal-matching/</link>
<pubDate>Tue, 05 Dec 2023 23:21:43 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-12-05-maximal-matching/</guid>
<description>Note The Matching algorithm is a graph algorithm that finds a matching in a graph, where a matching is a set of edges without common vertices. In other words, a subset of the edges is a matching if each vertex appears in at most one edge of that matching.
A Maximal matching is a matching that cannot have any more edges added to it without violating the matching property.
A maximum matching is a matching that contains the largest possible number of edges.</description>
</item>
<item>
<title>Observable Behaviors</title>
<link>https://yuang-chen.github.io/posts/2023-12-02-observable-behaviors/</link>
<pubDate>Sat, 02 Dec 2023 18:12:37 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-12-02-observable-behaviors/</guid>
<description>What is Observable Behavior &amp; Related Issues The term observable behavior, according to the standard, means the following:
— Accesses (reads and writes) to volatile objects occur strictly according to the semantics of the expressions in which they occur. In particular, they are not reordered with respect to other volatile accesses on the same thread.
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.</description>
</item>
<item>
<title>Graph Coloring</title>
<link>https://yuang-chen.github.io/posts/2023-11-29-graph-coloring/</link>
<pubDate>Wed, 29 Nov 2023 10:17:23 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-11-29-graph-coloring/</guid>
<description>Note Graph coloring is a way of assigning colors to the vertices of a graph so that no two adjacent vertices share the same color. This is a classical problem in the field of graph theory and has applications in various domains like scheduling, map coloring, and solving Sudoku puzzles.
The simplest form of graph coloring is vertex coloring, where the aim is to minimize the number of colors used. This problem is NP-hard, meaning there is no known algorithm that can solve all instances of the problem efficiently (in polynomial time).</description>
</item>
<item>
<title>Biconnected Components</title>
<link>https://yuang-chen.github.io/posts/2023-11-20-biconnected-components/</link>
<pubDate>Mon, 20 Nov 2023 10:43:56 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-11-20-biconnected-components/</guid>
<description>Note Biconnectivity in graphs is an important concept used to identify biconnected components (BCCs). A graph is biconnected if it is connected and does not have any articulation points, meaning removing any single vertex will not disconnect the graph. The biconnected components of a graph are maximal biconnected subgraphs.
Strict Definition: A BCC should contain at least three vertices in a cycle, ensuring that the removal of any single vertex does not disconnect the component.</description>
</item>
<item>
<title>Low Diameter Decomposition</title>
<link>https://yuang-chen.github.io/posts/2023-11-02-low-diameter-decomposition/</link>
<pubDate>Thu, 02 Nov 2023 19:04:28 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-11-02-low-diameter-decomposition/</guid>
<description>Note The Low-Diameter Decomposition (LDD) algorithm is a graph partitioning algorithm that decomposes a graph into several connected subgraphs (or components) such that each subgraph has a low diameter. The diameter of a subgraph is defined as the maximum shortest path distance between any two nodes within the subgraph.
The LDD algorithm works as follows:
Start with an empty decomposition and an empty queue. Pick an unvisited node u and create a new set containing only u.</description>
</item>
<item>
<title>Trivial Class vs Aggregate Structure</title>
<link>https://yuang-chen.github.io/posts/2023-11-01-trivial-class-vs-aggregate-structure/</link>
<pubDate>Wed, 01 Nov 2023 15:10:00 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-11-01-trivial-class-vs-aggregate-structure/</guid>
<description>Trivial Class vs Aggregate Structure Trivial Class A trivial class is a class that:
Has a trivial default constructor. Has a trivial copy constructor. Has a trivial move constructor (since C++11). Has a trivial copy assignment operator. Has a trivial move assignment operator (since C++11). Has a trivial destructor. Has no virtual functions or virtual base classes. The trivial constructors/operations/destructor means they are not user-provided (i.e., is implicitly-defined or defaulted on its first declaration).</description>
</item>
<item>
<title>Initialization With Brackets</title>
<link>https://yuang-chen.github.io/posts/2023-10-29-initialization-with-brackets/</link>
<pubDate>Sun, 29 Oct 2023 15:03:28 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-10-29-initialization-with-brackets/</guid>
<description>The table summarizes how brackets {} and () are related to list-initialization in various contexts. The column Allows Narrowing Conversion indicates whether implicit type conversions that lose information are allowed. The column Allows Explicit Constructors indicates whether the syntax can call constructors marked as explicit. The columns Use for Aggregates and Use for User-Defined Types show the applicability of each initialization type for aggregates like arrays (e.g., int x[3][4]) and structs, and user-defined types like classes, respectively.</description>
</item>
<item>
<title>SCAN Clustering</title>
<link>https://yuang-chen.github.io/posts/2023-10-22-scan-clustering/</link>
<pubDate>Sun, 22 Oct 2023 16:10:09 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-10-22-scan-clustering/</guid>
<description>Note The SCAN (Structural Clustering Algorithm for Networks) algorithm is used for detecting clusters in graphs. It also looks at the structural similarity between nodes:
$$ s(A, B) = \frac{|N(A) \cap N(B)|}{\sqrt{|N(A)| \times |N(B)|}} $$
Compute Structural Similarity: For each edge (A,B)(A,B), compute its structural similarity score. Identify Strong Relations: Mark edges as &lsquo;strong&rsquo; if their structural similarity is above **eps. Identify Core Nodes: For each node, count its strong relationships.</description>
</item>
<item>
<title>Priority Queue</title>
<link>https://yuang-chen.github.io/posts/2023-10-14-priority-queue/</link>
<pubDate>Sat, 14 Oct 2023 12:17:12 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-10-14-priority-queue/</guid>
<description>The core reason for my re-implementing the standard containers is the Priority Queue (or namely Max Heap). It combines algorithms and fundamental data structures to create a sophisticated yet highly efficient data structure. My current focus on reinventing these containers has temporarily paused here. Similar containers, like flat_set, are slated for release in C++23. When they become available, I plan to continue this series by attempting to re-implement them.
Description A priority queue is a container adapter offering constant time access to the largest (by default) element, albeit at the cost of logarithmic time insertion and extraction.</description>
</item>
<item>
<title>Strongly Connected Components</title>
<link>https://yuang-chen.github.io/posts/2023-10-12-strongly-connected-components/</link>
<pubDate>Thu, 12 Oct 2023 11:54:29 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-10-12-strongly-connected-components/</guid>
<description>Description Strongly Connected Components operates the directed graph in which there is a directed path from each vertex to every other vertex.
Weakly Connected Component (the one we discussed before) ignores the direction of the edges. WCC is commonly considered the &ldquo;default&rdquo; CC algorithm, if there isn&rsquo;t a specification for Strongly or Weakly.
Kosaraju&rsquo;s Algorithm: Run 1st DFS to get finishing times of each vertex (i.e., postordering of DFS). [Backtracking] Run 2nd DFS on the transposed graph, starting with the visited vertices in Reverse Post-Order Each DFS tree in step 2 is an SCC.</description>
</item>
<item>
<title>Queue & Stack</title>
<link>https://yuang-chen.github.io/posts/2023-10-05-queue-stack/</link>
<pubDate>Thu, 05 Oct 2023 10:30:44 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-10-05-queue-stack/</guid>
<description>Description Both std::queue and std::stack are container adaptors that rely on an underlying container to provide specific functionality. For example:
std::queue implements a First-In-First-Out (FIFO) flow, making it efficient to remove the front element. It can use std::deque (by default) or std::list as the underlying container. std::stack follows a Last-In-First-Out (LIFO) flow, where the back element needs efficient modification. By default, it uses std::deque but can also be based on std::list or std::vector.</description>
</item>
<item>
<title>Minimum Spanning Tree</title>
<link>https://yuang-chen.github.io/posts/2023-09-29-minimum-spanning-tree/</link>
<pubDate>Fri, 29 Sep 2023 10:34:39 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-29-minimum-spanning-tree/</guid>
<description>Description A Minimum Spanning Tree (MST) of a weighted, connected, undirected graph is a tree that spans all the vertices in the graph and has the minimum possible total edge weight among all the trees that can be created from the graph. In simpler terms, it&rsquo;s a subgraph that includes all the vertices, is a tree (meaning it has no cycles), and the sum of its edge weights is as small as possible.</description>
</item>
<item>
<title>Unordered {Set|Map|Multiset|Multimap}</title>
<link>https://yuang-chen.github.io/posts/2023-09-27-unordered-set/</link>
<pubDate>Wed, 27 Sep 2023 18:42:56 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-27-unordered-set/</guid>
<description>Description The implementation of unordered containers rely on hashing techniques and utilize buckets for storing elements. Each bucket is essentially a vector containing a (singly) linked list. The following steps outline how elements are located, whether for finding, inserting, or erasing:
Compute the hash value of the key. Determine the bucket index by taking the remainder of the hash value divided by the bucket size, e.g., index = {hash value} % {bucket size}.</description>
</item>
<item>
<title>Set & Map</title>
<link>https://yuang-chen.github.io/posts/2023-09-26-set-map/</link>
<pubDate>Tue, 26 Sep 2023 00:09:48 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-26-set-map/</guid>
<description>Description Both std::set and std::map are underpinned by red-black trees (RBT). RBTs are self-balancing binary trees, albeit not perfectly balanced. In this structure, it&rsquo;s ensured that the values (for std::set) or keys (for std::map) adhere to the following condition: node→left &lt; node &lt; node→right. Consequently, the RBT are considered ordered, so std::set and std::map are called ordered containers.
RBT are characterized as follows:
Property
A node is either red or black.</description>
</item>
<item>
<title>Triangle Counting</title>
<link>https://yuang-chen.github.io/posts/2023-09-23-triangle-counting/</link>
<pubDate>Sat, 23 Sep 2023 17:09:51 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-23-triangle-counting/</guid>
<description>count how many triangles can be formed inside the graph undirected graph, and each triangle would be counted for three times, once per node. $O(n^3)$ #include &lt;iostream&gt; #include &lt;vector&gt; // Reference: https://github.com/georgegito/vertexwise-triangle-counting/blob/master/src/v3/v3_seq.cpp // allow for parallelism auto bfs_tc(const std::vector&lt;int&gt;&amp; rowPtr, const std::vector&lt;int&gt;&amp; colIdx) { int numTriangles = 0; const auto numVertices = rowPtr.size() - 1; // check if two nodes have an edge between them with binary search (require sorted colIdx) auto intersect = [&amp;](int first, int second) -&gt; bool { // std::find is O(N), assuming the iterator is a forward iterator // auto first_begin = colIdx.</description>
</item>
<item>
<title>Betweenness Centrality</title>
<link>https://yuang-chen.github.io/posts/2023-09-18-betweenness-centrality/</link>
<pubDate>Mon, 18 Sep 2023 17:12:24 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-18-betweenness-centrality/</guid>
<description>The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.
perform BFS (or SSSP if weighted graphs) for each vertex keep a stack of path for backtracking, i.e., traversing the graph in reverse BFS order #include &lt;iostream&gt; #include &lt;queue&gt; #include &lt;stack&gt; #include &lt;vector&gt; auto brandes(const std::vector&lt;int&gt;&amp; rowPtr, const std::vector&lt;int&gt;&amp; colIdx) { const auto numVertices = rowPtr.size() - 1; std::vector&lt;float&gt; betweenness(numVertices, 0.0f); //For each vertex s, perform a BFS to establish levels and predecessors //!</description>
</item>
<item>
<title>Connected Components</title>
<link>https://yuang-chen.github.io/posts/2023-09-12-connected-components/</link>
<pubDate>Tue, 12 Sep 2023 12:32:15 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-12-connected-components/</guid>
<description>Description Three different variants of Connected Component (CC) algorithms are implemented, and the comparisons are provided as follows:
Algorithm Time Complexity Parallelism Techniques DFS $(O(V + E))$ Poor Recursive Traversal Union-Find $(O(V + E \alpha(V)))$ Poor Path Compression, Union by Rank Shiloach-Vishkin $(O(\log^* V))$ Highly Parallel Pointer Jumping Here, $( \log^* )$ is the iterated logarithm, which is extremely slow-growing, making the algorithm very fast. $( \alpha(V) )$ is the inverse Ackermann function, practically a constant for all feasible input sizes.</description>
</item>
<item>
<title>List</title>
<link>https://yuang-chen.github.io/posts/2023-09-11-list/</link>
<pubDate>Mon, 11 Sep 2023 16:33:34 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-11-list/</guid>
<description>Description STL indeed offers std::list and std::forward_list, which are essentially double-linked list and single-linked list, respectively. std::list provides operations like push_back/front, pop_back/front with a time complexity of O(1), and supports bidirectional iterators. On the other hand, std::forward_list only allows fronting operations with O(1) and insert/erase_after for backing operations, which have a time complexity of O(n); also, it only supports forward iterators.
A valuable feature of lists is that they prohibit iterator invalidation compared to some other data structures.</description>
</item>
<item>
<title>SSSP</title>
<link>https://yuang-chen.github.io/posts/2023-09-09-sssp/</link>
<pubDate>Sat, 09 Sep 2023 13:36:19 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-09-sssp/</guid>
<description>Two variants of Single-Source Shortest Path (SSSP) have been implemented as follows. Bellman-Ford is the one that is widely implemented in parallel graph frameworks. This is because the use of a heap in Dijkstra&rsquo;s algorithm can limit the parallelism of the code.
Criteria Dijkstra&rsquo;s Algorithm Bellman-Ford Algorithm Type Greedy Dynamic Programming Usage Positive weights Negative weights OK Time Complexity O((V + E) * log(V)) O(V * E) Negative Cycles No Yes (Detectable) Data Structures Priority Queue None (Arrays) Initialization Start node: 0, rest ∞ Start node: 0, rest ∞ Relaxation Decrease Key Relaxation BellmanFord BellmanFord: Perform numVertices - 1 iterations of graph traversal to find the shortest path an additional iteration checks if negative cycles exist $O(|V| * |E|)$ time complexity Code #include &lt;iostream&gt; #include &lt;queue&gt; #include &lt;vector&gt; std::vector&lt;int&gt; bellmanFord(const int root, const std::vector&lt;int&gt;&amp; rowPtr, const std::vector&lt;int&gt;&amp; colIdx, const std::vector&lt;float&gt;&amp; weight) { const auto numVertices = rowPtr.</description>
</item>
<item>
<title>Deque</title>
<link>https://yuang-chen.github.io/posts/2023-09-04-deque/</link>
<pubDate>Mon, 04 Sep 2023 21:53:31 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-04-deque/</guid>
<description>Description std::deque extends the interfaces of std::vector with push_front, pop_front, etc., such that elements can be inserted or removed at the end or beginning at constant time.
I&rsquo;ve hardly ever incorporated std::deque in my own coding projects, and it&rsquo;s a rarity in other people&rsquo;s work as well.
Code std::deque is essentially a sequence of individually allocated fixed-size arrays. The real challenge lies in the bookkeeping. Four variables are relied on to keep track of data:</description>
</item>
<item>
<title>Vector & Array</title>
<link>https://yuang-chen.github.io/posts/2023-09-02-vector-array/</link>
<pubDate>Sat, 02 Sep 2023 10:59:59 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-02-vector-array/</guid>
<description>Array is allocated in stack memory Vector is allocated in heap memory. Its capacity is “pre-allocated”. #include &lt;iostream&gt; template&lt;typename T&gt; class Vector { private: T* data_; size_t size_; size_t capacity_; public: Vector(): data_(nullptr), size_(0), capacity_(0) {} Vector(size_t n_): size_(n_), capacity_(n_) { data_ = new T[n_]; } ~Vector() { delete [] data_; }; T&amp; operator[] (size_t index) { return data_[index]; } const T&amp; operator[] (size_t index) const { return data_[index]; } size_t size() const { return size_; } void push_back(const T&amp; value) { if(size_ == capacity_) { capacity_ = size_ == 0?</description>
</item>
<item>
<title>BFS & DFS</title>
<link>https://yuang-chen.github.io/posts/2023-09-01-bfs/</link>
<pubDate>Fri, 01 Sep 2023 11:17:51 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-09-01-bfs/</guid>
<description>Iterative BFS Despite its apparent simplicity, this approach relies heavily on the utilization of various STL containers. std::unordered_map records the parent of each node std::unordered_set checks if a node has been visited std::queue allows the nodes be accessed in the width-first flow; using std::stack for depth-first flow std::stack reverses the parents, so the path can be printed in root-to-target order. #include &lt;iostream&gt; #include &lt;vector&gt; #include &lt;unordered_map&gt; #include &lt;unordered_set&gt; #include &lt;queue&gt; #include &lt;stack&gt; std::stack&lt;int&gt; BFS(const int root, const int target, const std::vector&lt;int&gt;&amp; rowPtr, const std::vector&lt;int&gt;&amp; colIdx) { std::unordered_map&lt;int, int&gt; parent; std::unordered_set&lt;int&gt; visited; std::queue&lt;int&gt; nodeQue; // std::stack&lt;int&gt; nodeStk for DFS std::stack&lt;int&gt; path; bool hasFound = false; nodeQue.</description>
</item>
<item>
<title>Graph Algorithms</title>
<link>https://yuang-chen.github.io/posts/2023-08-31-graph-algorithms/</link>
<pubDate>Thu, 31 Aug 2023 18:12:09 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-08-31-graph-algorithms/</guid>
<description>Considering myself a researcher in graph algorithms, I&rsquo;ve come to the surprising realization that my grasp of these algorithms is not as solid as I thought. Hence, this blog series aims to document my exploration of various graph algorithms I&rsquo;ve encountered thus far, regardless of their complexity.
The algorithms are selected from the parallel graph frameworks GAP and GBBS, focusing on their single-threaded versions to assess their complexity.
Breadth-First Search (BFS) Single-Source Shortest Paths (SSSP) Connected Components (CC) Betweenness Centrality (BC) Triangle Counting (TC) Minimum Spanning Tree (MST) Strongly Connected Components (SCC) SCAN Clustering (SCAN) Low Diameter Decomposition (LDD) Biconnected-Components (BC) Graph Coloring (COLOR) Maximal Matching (MM) Maximal Independent Set (MIS) </description>
</item>
<item>
<title>STL Containers</title>
<link>https://yuang-chen.github.io/posts/2023-08-30-stl-containers/</link>
<pubDate>Wed, 30 Aug 2023 14:13:22 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-08-30-stl-containers/</guid>
<description>In my HPC-oriented programming, my go-to choices are typically limited to arrays and vectors because of their memory efficiency. Linked lists and hash maps, being non-contiguous in memory space, rarely find their way into my toolkit. These containers draw upon many classic algorithmic designs. Lately, as I&rsquo;ve been revisiting fundamental graph algorithms, I&rsquo;ve also decided to take on the tasks of re-implementing these containers in a simplified illustration.
They are:</description>
</item>
<item>
<title>Scope Guard</title>
<link>https://yuang-chen.github.io/posts/2023-08-29-scope-guard/</link>
<pubDate>Tue, 29 Aug 2023 10:27:54 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-08-29-scope-guard/</guid>
<description>Background Scope Guard is a concept reminiscent of the RAII (Resource Acquisition Is Initialization) principle in C++. The idea is to manage resources (like memory, files, network sockets, etc.) using object lifetime. When the object goes out of scope, its destructor ensures that the resource is cleaned up properly. The scope guard is intended to run a given callable (like a function or lambda) when it is destroyed.
RAII (Resource Acquisition Is Initialization) is a programming idiom used in C++ where the lifetime of an object is bound to the lifetime of its scope (typically represented by a block of code wrapped in curly braces {}).</description>
</item>
<item>
<title>Static Local Member</title>
<link>https://yuang-chen.github.io/posts/2023-08-27-static-local-member/</link>
<pubDate>Sun, 27 Aug 2023 11:45:15 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-08-27-static-local-member/</guid>
<description>C++ templates are blueprints and don&rsquo;t represent specific types until they are instantiated with actual types. Once instantiated, the compiler creates a specific version of that template for the provided type. For template classes, each instantiation has its own unique version of the static members, making them distinct for each type the template is instantiated with.
///////////////////// // Code Block 1 ///////////////////// #include&lt;iostream&gt; class ComponentBase{ protected: // component_type_count is a static variable shared by derived classes static inline size_t component_type_count = 0; }; template&lt;typename T&gt; class Component : public ComponentBase{ public: static size_t component_type_id(){ // ID is the static local variable for a particular type T static size_t ID = component_type_count++; return ID; } }; class A : public Component&lt;A&gt; {}; class B : public Component&lt;B&gt; {}; class C : public Component&lt;C&gt; {}; int main() { std::cout &lt;&lt; A::component_type_id() &lt;&lt; std::endl; // 0 std::cout &lt;&lt; B::component_type_id() &lt;&lt; std::endl; // 1 std::cout &lt;&lt; B::component_type_id() &lt;&lt; std::endl; // 1 std::cout &lt;&lt; A::component_type_id() &lt;&lt; std::endl; // 0 std::cout &lt;&lt; A::component_type_id() &lt;&lt; std::endl; // 0 std::cout &lt;&lt; C::component_type_id() &lt;&lt; std::endl; // 2 } Key Points:</description>
</item>
<item>
<title>Formatter Specialization</title>
<link>https://yuang-chen.github.io/posts/2023-08-25-formatter-specialization/</link>
<pubDate>Fri, 25 Aug 2023 19:56:16 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-08-25-formatter-specialization/</guid>
<description>We can customize the (printing) format of a given class by using the specialization of formatter.
#include &lt;format&gt; #include &lt;iostream&gt; struct Frac { int a, b; }; template &lt;&gt; struct std::formatter&lt;Frac&gt; : std::formatter&lt;string_view&gt; { // parse() is inherited from the base class std::formatter&lt;string_view&gt; // * an efficient solution: auto format(const Frac&amp; frac, std::format_context&amp; ctx) const { return std::format_to(ctx.out(), &#34;{}/{}&#34;, frac.a, frac.b); } // the same functionality as above, but inefficient due to the temporary string // auto format(const Frac&amp; frac, std::format_context&amp; ctx) const { // std::string temp; // std::format_to(std::back_inserter(temp), &#34;{}/{}&#34;, // frac.</description>
</item>
<item>
<title>User Defined Literals</title>
<link>https://yuang-chen.github.io/posts/2023-08-22-user-defined-literals/</link>
<pubDate>Tue, 22 Aug 2023 23:18:37 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-08-22-user-defined-literals/</guid>
<description>User Defined Literals (UDL) produces an object in an interesting way:
constexpr auto operator&#34;&#34;_f(const char* fmt, size_t) { return[=]&lt;typename... T&gt;(T&amp;&amp;... Args) { return std::vformat(fmt, std::make_format_args(std::forward&lt;T&gt;(Args)...)); }; } auto s = &#34;example {} see {}&#34;_f(&#34;yep&#34;, 1.1); // s = &#34;example yep 1.1&#34; The UDL _f has the same effect of std::format(&quot;example {} see {}&quot;, &quot;yep&quot;, 1.1). Pretty familiar (as libfmt), right?
Now, let&rsquo;s break the definition of _f down:
int x = 10; double y = 3.</description>
</item>
<item>
<title>Operator Overload</title>
<link>https://yuang-chen.github.io/posts/2023-08-17-operator-overload/</link>
<pubDate>Thu, 17 Aug 2023 10:36:19 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-08-17-operator-overload/</guid>
<description>Reference: here.
The return of overloaded operator should be a reference, otherwise return-by-code will create a (temporary) rvalue that cannot be passed to the next operation f2 by non-const reference. i.e., rvalue cannot be non-const referenced.
#include &lt;vector&gt; #include &lt;iostream&gt; #include &lt;functional&gt; template&lt;typename T, typename FN&gt; requires std::invocable&lt;FN, T&amp;&gt; // diff std::invocable? std::vector&lt;T&gt;&amp; operator| (std::vector&lt;T&gt;&amp; vec, FN fn) noexcept { for(auto&amp; e: vec) { fn(e); } return vec; } int main(){ std::vector v{1, 2, 3}; auto f1 = [](int&amp; i) {i *= i; }; std::function f2 {[](const int&amp; i) {std::cout &lt;&lt; i &lt;&lt; &#39; &#39;; } }; v | f1 | f2; }``` </description>
</item>
<item>
<title>Multidimensional Subscript Operator []</title>
<link>https://yuang-chen.github.io/posts/2023-05-13-multidim-subscript-operator/</link>
<pubDate>Sat, 13 May 2023 22:11:07 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-05-13-multidim-subscript-operator/</guid>
<description>Finally, C++23 allows overload for the subscript operator [] to be multi-dimensional.
Before that, we normally either use:
vector of vector to form a matrix, and access it as mat[i][j] a class containing a big 1-d vector, but behaves as 2-d by overloading the operator (), e.g., mat(i,j) Now, with C++23, we advance the second option (which offers efficient memory access) with better indexing approaching as follow:
template &lt;typename T, size_t R, size_t C&gt; struct matrix { T&amp; operator[](size_t const r, size_t const c) noexcept { return data_[r * C + c]; } T const&amp; operator[](size_t const r, size_t const c) const noexcept { return data_[r * C + c]; } static constexpr size_t Rows = R; static constexpr size_t Columns = C; private: std::array&lt;T, R * C&gt; data_; }; int main() { matrix&lt;int, 3, 2&gt; m; for(size_t i = 0; i &lt; m.</description>
</item>
<item>
<title>Bitwise Op</title>
<link>https://yuang-chen.github.io/posts/2023-05-07-bitwise-op/</link>
<pubDate>Sun, 07 May 2023 23:33:24 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-05-07-bitwise-op/</guid>
<description>🦥 An old note.
Bitwise vs Arithmetic running on a vector of size 2^31, bitwise operations are significantly faster than arithmetic counterparts:
seg = 64; volume = (vec_size - 1)/ seg + 1; unsigned bs = log2(seg); unsigned bv= log2(volume); unsigned bbv = volume - 1; Arithmetic: out[i] = i % volume * seg + i / volume
Bitwise: out[i] = ((i &amp; bbv) &lt;&lt; bs) + (i &gt;&gt; bv)</description>
</item>
<item>
<title>Omp Parallel Region</title>
<link>https://yuang-chen.github.io/posts/2023-05-02-omp-parallel-region/</link>
<pubDate>Tue, 02 May 2023 10:34:19 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-05-02-omp-parallel-region/</guid>
<description>The results look suspicious to me&hellip; But I wrote down this note many days ago 🦥. Maybe I need to evaluate it again.
Multiple Parallel Regions The cost of constructing parallel region is expensive in OpenMP. Let&rsquo;s use two example for illustration:
Three loops operating on a vector of size 2^31, e.g.,
for(size_t i = 0; i &lt; vec.size(); i++) vec[i] += 1, vec[i] *= 0.9, vec[i] /= 7, Case 1: a large parallel region including the three loops by omp parallel { omp for }</description>
</item>
<item>
<title>Omp Collapse</title>
<link>https://yuang-chen.github.io/posts/2023-05-02-omp-collapse/</link>
<pubDate>Tue, 02 May 2023 10:28:18 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-05-02-omp-collapse/</guid>
<description>One of my old-day notes 🦥.
Collapse of Nested Loops The collapse clause converts a prefect nested loop into a single loop then parallelize it. The condition of a perfect nested loop is that, the inner loop is tightly included by the outer loop, and no other codes lying between:
for(int i = 0 ... ) { for(int j = 0 ...) { task[i][j]; } } Such condition is hard to meet.</description>
</item>
<item>
<title>Vector vs Array</title>
<link>https://yuang-chen.github.io/posts/2023-05-01-vector-vs-array/</link>
<pubDate>Mon, 01 May 2023 12:53:14 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-05-01-vector-vs-array/</guid>
<description>Another post recycled from my earlier notes. I really don&rsquo;t have motivation to improve it further 🦥.
Vector vs Array Initilization The Vector is the preferred choice for data storage in mordern C++. It is internally implemented based on the Array. However, the performance gap between the two is indeed obvious.
The Vector can be initialized via std::vector&lt;T&gt; vec(size). Meanwhile, an Array is initialized by T* arr = new T[size]</description>
</item>
<item>
<title> Gather with SIMD</title>
<link>https://yuang-chen.github.io/posts/2023-04-27-gather-simd/</link>
<pubDate>Thu, 27 Apr 2023 13:27:50 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-04-27-gather-simd/</guid>
<description>Writing SIMD code that works across different platforms can be a challenging task. The following log illustrates how a seemingly simple operation in C++ can quickly escalate into a significant problem.
Let&rsquo;s look into the code below, where the elements of x is accessed through indices specified by idx.
normal code std::vector&lt;float&gt; x = /*some data*/ std::vector&lt;int&gt; idx = /* index */ for(auto i: idx) { auto data = x[i]; } Gather with Intel In AVX512, Gather is a specific intrinsic function to transfer data from a data array to a target vec, according to an index vec.</description>
</item>
<item>
<title>SIMD is Pain</title>
<link>https://yuang-chen.github.io/posts/2023-04-25-simd-pain-intro/</link>
<pubDate>Tue, 25 Apr 2023 20:59:39 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-04-25-simd-pain-intro/</guid>
<description>Writing code with SIMD for vectorization is painful. It deserves a blog series to record all sorts of pains I have encountered and (partially) overcome.
Indeed, once the pain of coding and debugging is finished, the program is lightning-faster. Nonetheless, I am here to complain instead of praising. Let me state why writing SIMD code is causing me emotional damage:
a single line of normal c++ code could be easily inflated to a dozen lines of code.</description>
</item>
<item>
<title>Parallel Algorithms from Libraries</title>
<link>https://yuang-chen.github.io/posts/2023-04-25-par-algo/</link>
<pubDate>Tue, 25 Apr 2023 10:16:34 +0800</pubDate>
<guid>https://yuang-chen.github.io/posts/2023-04-25-par-algo/</guid>
<description>The content of this post is extracted from my previous random notes. I am too lazy to update and organize it 🦥.
C++17 new feature &ndash; parallel algorithms The parallel algorithms and execution policies are introduced in C++17. Unfortuantely, according to CppReference, only GCC and Intel support these features. Clang still leaves them unimplemented.
A blog about it.
The parallel library brough by C++17 requires the usage of Intel&rsquo;s oneTBB for multithreading.</description>
</item>
<item>
<title>About Me</title>
<link>https://yuang-chen.github.io/about/aboutme/</link>
<pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
<guid>https://yuang-chen.github.io/about/aboutme/</guid>
<description>I am now a Postdoc at CUHK, and graduated from CUHK on Shenzhen campus for my PhD. My CV is da.
My research focuses on optimizing sparse workloads for modern computing hardware. This involves addressing the challenge of efficiently processing sparse data structures (containing mostly empty or zero values) on hardware designed for dense, regular computations.
Parallel Graph algorithms, e,g, PageRank, BFS, Triangle Counting, etc. Sparse matrix multiplication (SpMV, SpMM, SDDMM, SpGEMM) on CPUs and GPUs Graph Neural Network &amp; Sparse Large Language Models.</description>
</item>
</channel>
</rss>