index.xml

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Yac&#39;s Log</title>
    <link>https://yuang-chen.github.io/</link>
    <description>Recent content on Yac&#39;s Log</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 31 Dec 2024 08:29:11 +0800</lastBuildDate><atom:link href="https://yuang-chen.github.io/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Small String Optimization</title>
      <link>https://yuang-chen.github.io/posts/2024-12-31-small-string-optimization/</link>
      <pubDate>Tue, 31 Dec 2024 08:29:11 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-31-small-string-optimization/</guid>
      <description>The basic struct of string consists of three members:
struct string { char* mPtr; // dynamically allocated memory size_t mSize; // the length of the string size_t mCapacity; // the size of allocated memory }; Allocating memory for small strings (e.g., empty string with a null \0 character) is wasteful. Hence, to avoid this waste, most implementations of string structs apply Small String Optimization (SSO), which stores small strings directly within the string object on the stack, rather than allocating memory dynamically on the heap.</description>
    </item>
    
    <item>
      <title>2024 12 31 Constexpr Static</title>
      <link>https://yuang-chen.github.io/posts/2024-12-31-constexpr-static/</link>
      <pubDate>Tue, 31 Dec 2024 08:28:44 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-31-constexpr-static/</guid>
      <description></description>
    </item>
    
    <item>
      <title>Initializer List</title>
      <link>https://yuang-chen.github.io/posts/2024-12-28-initializer-list/</link>
      <pubDate>Sat, 28 Dec 2024 23:03:50 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-28-initializer-list/</guid>
      <description>In a prior post, I talked about list initialization, which is differs from initializer_list discussed here. Though personally, I don&amp;rsquo;t find initializer_list really useful as I never used it in my projects.
list initialization is a general syntax using {} for initializing a variety of variables and objects. initializer_list is a template class representing a lightweight, read-only array of elements, typically used in constructors or functions. initializer_list promotes safety, flexibility and modern tone compared to the raw array.</description>
    </item>
    
    <item>
      <title>`constexpr` from the perspective of assembly code</title>
      <link>https://yuang-chen.github.io/posts/2024-12-24-constexpr-assembly/</link>
      <pubDate>Tue, 24 Dec 2024 10:58:30 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-24-constexpr-assembly/</guid>
      <description>constexpr is a keyword in C++ that allows the compiler to evaluate expressions at compile time. This is a powerful feature that can significantly optimize performance by reducing runtime overhead.
However, I mainly use it for type-related operations. I seldom apply it to data-related tasks, since defining data with constexpr requires constant values, which is rarely feasible in my projects.
Code with constexpr #include &amp;lt;stddef.h&amp;gt; #include &amp;lt;string_view&amp;gt; #include &amp;lt;algorithm&amp;gt; #include &amp;lt;cstdio&amp;gt; template&amp;lt;size_t N&amp;gt; class FixedString { size_t mSize{}; char mData[N]{}; public: FixedString() = default; // Constructor that computes string length at compile time constexpr FixedString(const char* str) : mSize{std::char_traits&amp;lt;char&amp;gt;::length(str)} { std::copy_n(str, size(), mData); } constexpr size_t size() const { return mSize; } constexpr std::string_view string_view() const { return {mData, mSize}; } }; template&amp;lt;size_t N&amp;gt; constexpr auto make_fixed_string(const char (&amp;amp;str)[N]) { return FixedString&amp;lt;N&amp;gt;{str}; } constexpr const static FixedString&amp;lt;50&amp;gt; x{&amp;#34;Hello, embedded World!</description>
    </item>
    
    <item>
      <title>Stateless Type</title>
      <link>https://yuang-chen.github.io/posts/2024-12-20-stateless-type/</link>
      <pubDate>Fri, 20 Dec 2024 09:18:21 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-20-stateless-type/</guid>
      <description>In C++, the term &amp;ldquo;stateless&amp;rdquo; typically refers to a type (class or struct) that:
Has no non-static data members, meaning it does not store any instance-specific information. Does not maintain any internal state or data that varies between objects of that type. Stateless types are often empty classes used for utility purposes, such as:
Custom deleters for smart pointers. #include &amp;lt;memory&amp;gt; #include &amp;lt;iostream&amp;gt; struct EmptyDeleter { void operator()(int* ptr) const { delete ptr; std::cout &amp;lt;&amp;lt; &amp;#34;Deleted\n&amp;#34;; } }; int main() { std::unique_ptr&amp;lt;int, EmptyDeleter&amp;gt; ptr(new int(42)); std::cout &amp;lt;&amp;lt; &amp;#34;Size of unique_ptr: &amp;#34; &amp;lt;&amp;lt; sizeof(ptr) &amp;lt;&amp;lt; &amp;#34; bytes\n&amp;#34;; // 8 bytes return 0; } Tags for template metaprogramming.</description>
    </item>
    
    <item>
      <title>Empty Data Members</title>
      <link>https://yuang-chen.github.io/posts/2024-12-19-empty-data-members/</link>
      <pubDate>Thu, 19 Dec 2024 09:27:28 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-19-empty-data-members/</guid>
      <description>[[no_unique_address]] since C++20 [[no_unique_address]] applies to user-defined types (e.g., empty or stateless classes or structs). It does not apply to fundamental types (int, float, etc.), as they always require memory for storage. The attribute optimizes memory layout by allowing empty or stateless user-defined types to overlap memory locations, improving efficiency without violating the C++ object model. Motivation Prior to C++20, Empty Base Optimization (EBO) allowed an empty base class to take zero space when it was inherited by another class.</description>
    </item>
    
    <item>
      <title>Struct Alignment and Padding</title>
      <link>https://yuang-chen.github.io/posts/2024-12-18-alignment-and-padding/</link>
      <pubDate>Wed, 18 Dec 2024 09:25:40 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-18-alignment-and-padding/</guid>
      <description>In a struct, the padded bytes depend on the alignment requirement of the next member following the current member, because the compiler must ensure proper and efficient access to memory.
Alignment Requirement: Each data type has a required alignment, which is typically a power of two. For example:
char: 1-byte alignment int: 4-byte alignment long (on a 64-bit system): 8-byte alignment double: 8-byte alignment Padding: When laying out struct members, if the next member needs stricter (i.</description>
    </item>
    
    <item>
      <title>Empty Struct</title>
      <link>https://yuang-chen.github.io/posts/2024-12-14-empty-struct/</link>
      <pubDate>Sat, 14 Dec 2024 10:08:30 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-14-empty-struct/</guid>
      <description>Definition of an Empty Class An empty class is a class that:
Contains no non-static data members. May include: Member functions (including operator() or constructors), but these do not contribute to the class size. Static data members, because these are shared across all instances and are not part of the object layout. Does not use virtual functions or polymorphism, which would require the inclusion of a vtable pointer. Inherits from another empty class, as the derived class can still remain empty due to Empty Base Optimization (EBO).</description>
    </item>
    
    <item>
      <title>[CppCon] Fast and Small C&#43;&#43;</title>
      <link>https://yuang-chen.github.io/posts/2024-12-13-cppcon-fast-and-small/</link>
      <pubDate>Fri, 13 Dec 2024 10:49:22 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-12-13-cppcon-fast-and-small/</guid>
      <description>Recently, I watched this talk by Andreas Fertig at CppCon&#39;24. He discussed some very interesting topics, including new C++ features and union-based optimizations for efficient C++ programming.
To fully understand this talk byte by byte, I tried to re-implement the examples and experiment on my own, figuring out details with the help of ChatGPT. But I quickly found myself going down a rabbit hole of unfamiliar concepts that I’m not quite up to grasp yet.</description>
    </item>
    
    <item>
      <title>Performance Comparisons: Half, Half2 and Float</title>
      <link>https://yuang-chen.github.io/posts/2024-09-12-half-half2-float/</link>
      <pubDate>Thu, 12 Sep 2024 18:49:29 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-09-12-half-half2-float/</guid>
      <description>A performance evaluation is conducted on an Nvidia L40, comparing the 100-iteration access times of device vectors with half, half2, and float types. Each vector was initialized with 1024*1024 elements, but for the half2 type, two elements were packed into a single vector entry. Hence, two randomness are tested for half2 type: random access per half2 and random access per half.
Access Type Data Type Vector Size Allocated Memory Time (ms) Random half 1M 2MB 4.</description>
    </item>
    
    <item>
      <title>Constrained Non Type Template Parameter</title>
      <link>https://yuang-chen.github.io/posts/2024-06-17-constrained-non-type-template-parameter/</link>
      <pubDate>Mon, 17 Jun 2024 09:30:14 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-06-17-constrained-non-type-template-parameter/</guid>
      <description> NTTP (C++98): Allows templates to accept non-type parameters like integers or pointers, enhancing flexibility and efficiency. CNTTP (C++20): Extends NTTP by using concepts to constrain non-type parameters, improving type safety and expressiveness. Code Example #include &amp;lt;concepts&amp;gt; #include &amp;lt;cstddef&amp;gt; // Function using NTTP template&amp;lt;size_t i&amp;gt; // size_t is unsigned, so negative values will cause an error auto get_value_nttp() { return i; } // Function using CNTTP template&amp;lt;std::integral auto I&amp;gt; // constrained to integral types auto get_value_cnttp() { return I; } int main() { // NTTP example auto x = get_value_nttp&amp;lt;10&amp;gt;(); // correct, 10 is a valid size_t // auto y = get_value_nttp&amp;lt;-10&amp;gt;(); // error, -10 is not a valid size_t (uncomment to see the error) // CNTTP example auto w = get_value_cnttp&amp;lt;10&amp;gt;(); // correct, 10 is an integral type auto z = get_value_cnttp&amp;lt;-10&amp;gt;(); // correct, -10 is an integral type return 0; } </description>
    </item>
    
    <item>
      <title>Class Template Argument Deduction</title>
      <link>https://yuang-chen.github.io/posts/2024-05-07-class-template-argument-deduction/</link>
      <pubDate>Tue, 07 May 2024 09:05:16 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-05-07-class-template-argument-deduction/</guid>
      <description>Class Template Argument Deduction (CTAD) is a feature introduced in C++17 that allows the compiler to deduce the template arguments for class templates from the constructor arguments. This makes code more concise and avoids the need for explicit template arguments.
Example without CTAD: #include &amp;lt;vector&amp;gt; #include &amp;lt;iostream&amp;gt; int main() { std::vector&amp;lt;int&amp;gt; vec = {1, 2, 3, 4, 5}; // Explicit template argument for (const auto&amp;amp; elem : vec) { std::cout &amp;lt;&amp;lt; elem &amp;lt;&amp;lt; &amp;#34; &amp;#34;; } return 0; } Example with CTAD: #include &amp;lt;vector&amp;gt; #include &amp;lt;iostream&amp;gt; int main() { std::vector vec1 = {1, 2, 3, 4, 5}; // CTAD deduces std::vector&amp;lt;int&amp;gt; std::vector vec2 = {1.</description>
    </item>
    
    <item>
      <title>Approximate Densest Subgraph</title>
      <link>https://yuang-chen.github.io/posts/2024-01-26-approximate-densest-subgraph/</link>
      <pubDate>Fri, 26 Jan 2024 11:36:24 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-01-26-approximate-densest-subgraph/</guid>
      <description>Note The Approximate Densest Subgraph problem involves finding a subgraph of a given graph that has the highest density, where density is typically defined as the number of edges divided by the number of vertices in the subgraph. Finding the exact densest subgraph is computationally expensive, so approximate solutions are often sought.
Here&amp;rsquo;s a high-level outline of how an approximate algorithm for this problem might be implemented:
Initialization: Start with all vertices of the graph and no edges.</description>
    </item>
    
    <item>
      <title>Non-Virtual Polymorphism</title>
      <link>https://yuang-chen.github.io/posts/2024-01-24-non-virtual-polymorphism/</link>
      <pubDate>Wed, 24 Jan 2024 09:33:57 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-01-24-non-virtual-polymorphism/</guid>
      <description>Modern Features in C++17 Non-virtual runtime polymorphism can be achieved with modern C++ (e.g., C++17) features std::any and std::variant as described in the table below.
Notice std::tuple is not used for polymorphism; it offers a structured way to manage multiple values of different types simultaneously, such as in function return types, or parameter packs. It is put here because of its usage is a bit similar to std::any and std::variant.</description>
    </item>
    
    <item>
      <title>Tensor Core Register Layout</title>
      <link>https://yuang-chen.github.io/posts/2024-01-21-tensor-core-register-layout/</link>
      <pubDate>Sun, 21 Jan 2024 16:54:09 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2024-01-21-tensor-core-register-layout/</guid>
      <description>Layout 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0 16 17 18 19 20 21 22 23 0 0 0 0 0 0 0 0 24 25 26 27 28 29 30 31 0 0 0 0 0 0 0 0 32 33 34 35 36 37 38 39 0 0 0 0 0 0 0 0 40 41 42 43 44 45 46 47 0 0 0 0 0 0 0 0 48 49 50 51 52 53 54 55 0 0 0 0 0 0 0 0 56 57 58 59 60 61 62 63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 16 24 32 40 48 56 0 0 0 0 0 0 0 0 1 9 17 25 33 41 49 57 0 0 0 0 0 0 0 0 2 10 18 26 34 42 50 58 0 0 0 0 0 0 0 0 3 11 19 27 35 43 51 59 0 0 0 0 0 0 0 0 4 12 20 28 36 44 52 60 0 0 0 0 0 0 0 0 5 13 21 29 37 45 53 61 0 0 0 0 0 0 0 0 6 14 22 30 38 46 54 62 0 0 0 0 0 0 0 0 7 15 23 31 39 47 55 63 Code on V100 int half_elements = a_frag.</description>
    </item>
    
    <item>
      <title>Maximal Independent Set</title>
      <link>https://yuang-chen.github.io/posts/2023-12-13-maximal-independent-set/</link>
      <pubDate>Wed, 13 Dec 2023 11:00:10 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-12-13-maximal-independent-set/</guid>
      <description>Note An independent set in a graph is a set of vertices, no two of which are adjacent. A maximal independent set is an independent set that is not a subset of any other independent set in the graph. Here&amp;rsquo;s a basic approach to find a Maximal Independent Set:
Start with an empty set S. Iterate over all vertices of the graph. For each vertex: If the vertex and its neighbors are not in S, add the vertex to S.</description>
    </item>
    
    <item>
      <title>Maximal Matching</title>
      <link>https://yuang-chen.github.io/posts/2023-12-05-maximal-matching/</link>
      <pubDate>Tue, 05 Dec 2023 23:21:43 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-12-05-maximal-matching/</guid>
      <description>Note The Matching algorithm is a graph algorithm that finds a matching in a graph, where a matching is a set of edges without common vertices. In other words, a subset of the edges is a matching if each vertex appears in at most one edge of that matching.
A Maximal matching is a matching that cannot have any more edges added to it without violating the matching property.
A maximum matching is a matching that contains the largest possible number of edges.</description>
    </item>
    
    <item>
      <title>Observable Behaviors</title>
      <link>https://yuang-chen.github.io/posts/2023-12-02-observable-behaviors/</link>
      <pubDate>Sat, 02 Dec 2023 18:12:37 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-12-02-observable-behaviors/</guid>
      <description>What is Observable Behavior &amp;amp; Related Issues The term observable behavior, according to the standard, means the following:
— Accesses (reads and writes) to volatile objects occur strictly according to the semantics of the expressions in which they occur. In particular, they are not reordered with respect to other volatile accesses on the same thread.
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.</description>
    </item>
    
    <item>
      <title>Graph Coloring</title>
      <link>https://yuang-chen.github.io/posts/2023-11-29-graph-coloring/</link>
      <pubDate>Wed, 29 Nov 2023 10:17:23 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-11-29-graph-coloring/</guid>
      <description>Note Graph coloring is a way of assigning colors to the vertices of a graph so that no two adjacent vertices share the same color. This is a classical problem in the field of graph theory and has applications in various domains like scheduling, map coloring, and solving Sudoku puzzles.
The simplest form of graph coloring is vertex coloring, where the aim is to minimize the number of colors used. This problem is NP-hard, meaning there is no known algorithm that can solve all instances of the problem efficiently (in polynomial time).</description>
    </item>
    
    <item>
      <title>Biconnected Components</title>
      <link>https://yuang-chen.github.io/posts/2023-11-20-biconnected-components/</link>
      <pubDate>Mon, 20 Nov 2023 10:43:56 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-11-20-biconnected-components/</guid>
      <description>Note Biconnectivity in graphs is an important concept used to identify biconnected components (BCCs). A graph is biconnected if it is connected and does not have any articulation points, meaning removing any single vertex will not disconnect the graph. The biconnected components of a graph are maximal biconnected subgraphs.
Strict Definition: A BCC should contain at least three vertices in a cycle, ensuring that the removal of any single vertex does not disconnect the component.</description>
    </item>
    
    <item>
      <title>Low Diameter Decomposition</title>
      <link>https://yuang-chen.github.io/posts/2023-11-02-low-diameter-decomposition/</link>
      <pubDate>Thu, 02 Nov 2023 19:04:28 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-11-02-low-diameter-decomposition/</guid>
      <description>Note The Low-Diameter Decomposition (LDD) algorithm is a graph partitioning algorithm that decomposes a graph into several connected subgraphs (or components) such that each subgraph has a low diameter. The diameter of a subgraph is defined as the maximum shortest path distance between any two nodes within the subgraph.
The LDD algorithm works as follows:
Start with an empty decomposition and an empty queue. Pick an unvisited node u and create a new set containing only u.</description>
    </item>
    
    <item>
      <title>Trivial Class vs Aggregate Structure</title>
      <link>https://yuang-chen.github.io/posts/2023-11-01-trivial-class-vs-aggregate-structure/</link>
      <pubDate>Wed, 01 Nov 2023 15:10:00 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-11-01-trivial-class-vs-aggregate-structure/</guid>
      <description>Trivial Class vs Aggregate Structure Trivial Class A trivial class is a class that:
Has a trivial default constructor. Has a trivial copy constructor. Has a trivial move constructor (since C++11). Has a trivial copy assignment operator. Has a trivial move assignment operator (since C++11). Has a trivial destructor. Has no virtual functions or virtual base classes. The trivial constructors/operations/destructor means they are not user-provided (i.e., is implicitly-defined or defaulted on its first declaration).</description>
    </item>
    
    <item>
      <title>Initialization With Brackets</title>
      <link>https://yuang-chen.github.io/posts/2023-10-29-initialization-with-brackets/</link>
      <pubDate>Sun, 29 Oct 2023 15:03:28 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-10-29-initialization-with-brackets/</guid>
      <description>The table summarizes how brackets {} and () are related to list-initialization in various contexts. The column Allows Narrowing Conversion indicates whether implicit type conversions that lose information are allowed. The column Allows Explicit Constructors indicates whether the syntax can call constructors marked as explicit. The columns Use for Aggregates and Use for User-Defined Types show the applicability of each initialization type for aggregates like arrays (e.g., int x[3][4]) and structs, and user-defined types like classes, respectively.</description>
    </item>
    
    <item>
      <title>SCAN Clustering</title>
      <link>https://yuang-chen.github.io/posts/2023-10-22-scan-clustering/</link>
      <pubDate>Sun, 22 Oct 2023 16:10:09 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-10-22-scan-clustering/</guid>
      <description>Note The SCAN (Structural Clustering Algorithm for Networks) algorithm is used for detecting clusters in graphs. It also looks at the structural similarity between nodes:
$$ s(A, B) = \frac{|N(A) \cap N(B)|}{\sqrt{|N(A)| \times |N(B)|}} $$
Compute Structural Similarity: For each edge (A,B)(A,B), compute its structural similarity score. Identify Strong Relations: Mark edges as &amp;lsquo;strong&amp;rsquo; if their structural similarity is above **eps. Identify Core Nodes: For each node, count its strong relationships.</description>
    </item>
    
    <item>
      <title>Priority Queue</title>
      <link>https://yuang-chen.github.io/posts/2023-10-14-priority-queue/</link>
      <pubDate>Sat, 14 Oct 2023 12:17:12 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-10-14-priority-queue/</guid>
      <description>The core reason for my re-implementing the standard containers is the Priority Queue (or namely Max Heap). It combines algorithms and fundamental data structures to create a sophisticated yet highly efficient data structure. My current focus on reinventing these containers has temporarily paused here. Similar containers, like flat_set, are slated for release in C++23. When they become available, I plan to continue this series by attempting to re-implement them.
Description A priority queue is a container adapter offering constant time access to the largest (by default) element, albeit at the cost of logarithmic time insertion and extraction.</description>
    </item>
    
    <item>
      <title>Strongly Connected Components</title>
      <link>https://yuang-chen.github.io/posts/2023-10-12-strongly-connected-components/</link>
      <pubDate>Thu, 12 Oct 2023 11:54:29 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-10-12-strongly-connected-components/</guid>
      <description>Description Strongly Connected Components operates the directed graph in which there is a directed path from each vertex to every other vertex.
Weakly Connected Component (the one we discussed before) ignores the direction of the edges. WCC is commonly considered the &amp;ldquo;default&amp;rdquo; CC algorithm, if there isn&amp;rsquo;t a specification for Strongly or Weakly.
Kosaraju&amp;rsquo;s Algorithm: Run 1st DFS to get finishing times of each vertex (i.e., postordering of DFS). [Backtracking] Run 2nd DFS on the transposed graph, starting with the visited vertices in Reverse Post-Order Each DFS tree in step 2 is an SCC.</description>
    </item>
    
    <item>
      <title>Queue &amp; Stack</title>
      <link>https://yuang-chen.github.io/posts/2023-10-05-queue-stack/</link>
      <pubDate>Thu, 05 Oct 2023 10:30:44 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-10-05-queue-stack/</guid>
      <description>Description Both std::queue and std::stack are container adaptors that rely on an underlying container to provide specific functionality. For example:
std::queue implements a First-In-First-Out (FIFO) flow, making it efficient to remove the front element. It can use std::deque (by default) or std::list as the underlying container. std::stack follows a Last-In-First-Out (LIFO) flow, where the back element needs efficient modification. By default, it uses std::deque but can also be based on std::list or std::vector.</description>
    </item>
    
    <item>
      <title>Minimum Spanning Tree</title>
      <link>https://yuang-chen.github.io/posts/2023-09-29-minimum-spanning-tree/</link>
      <pubDate>Fri, 29 Sep 2023 10:34:39 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-29-minimum-spanning-tree/</guid>
      <description>Description A Minimum Spanning Tree (MST) of a weighted, connected, undirected graph is a tree that spans all the vertices in the graph and has the minimum possible total edge weight among all the trees that can be created from the graph. In simpler terms, it&amp;rsquo;s a subgraph that includes all the vertices, is a tree (meaning it has no cycles), and the sum of its edge weights is as small as possible.</description>
    </item>
    
    <item>
      <title>Unordered {Set|Map|Multiset|Multimap}</title>
      <link>https://yuang-chen.github.io/posts/2023-09-27-unordered-set/</link>
      <pubDate>Wed, 27 Sep 2023 18:42:56 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-27-unordered-set/</guid>
      <description>Description The implementation of unordered containers rely on hashing techniques and utilize buckets for storing elements. Each bucket is essentially a vector containing a (singly) linked list. The following steps outline how elements are located, whether for finding, inserting, or erasing:
Compute the hash value of the key. Determine the bucket index by taking the remainder of the hash value divided by the bucket size, e.g., index = {hash value} % {bucket size}.</description>
    </item>
    
    <item>
      <title>Set &amp; Map</title>
      <link>https://yuang-chen.github.io/posts/2023-09-26-set-map/</link>
      <pubDate>Tue, 26 Sep 2023 00:09:48 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-26-set-map/</guid>
      <description>Description Both std::set and std::map are underpinned by red-black trees (RBT). RBTs are self-balancing binary trees, albeit not perfectly balanced. In this structure, it&amp;rsquo;s ensured that the values (for std::set) or keys (for std::map) adhere to the following condition: node→left &amp;lt; node &amp;lt; node→right. Consequently, the RBT are considered ordered, so std::set and std::map are called ordered containers.
RBT are characterized as follows:
Property
A node is either red or black.</description>
    </item>
    
    <item>
      <title>Triangle Counting</title>
      <link>https://yuang-chen.github.io/posts/2023-09-23-triangle-counting/</link>
      <pubDate>Sat, 23 Sep 2023 17:09:51 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-23-triangle-counting/</guid>
      <description>count how many triangles can be formed inside the graph undirected graph, and each triangle would be counted for three times, once per node. $O(n^3)$ #include &amp;lt;iostream&amp;gt; #include &amp;lt;vector&amp;gt; // Reference: https://github.com/georgegito/vertexwise-triangle-counting/blob/master/src/v3/v3_seq.cpp // allow for parallelism auto bfs_tc(const std::vector&amp;lt;int&amp;gt;&amp;amp; rowPtr, const std::vector&amp;lt;int&amp;gt;&amp;amp; colIdx) { int numTriangles = 0; const auto numVertices = rowPtr.size() - 1; // check if two nodes have an edge between them with binary search (require sorted colIdx) auto intersect = [&amp;amp;](int first, int second) -&amp;gt; bool { // std::find is O(N), assuming the iterator is a forward iterator // auto first_begin = colIdx.</description>
    </item>
    
    <item>
      <title>Betweenness Centrality</title>
      <link>https://yuang-chen.github.io/posts/2023-09-18-betweenness-centrality/</link>
      <pubDate>Mon, 18 Sep 2023 17:12:24 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-18-betweenness-centrality/</guid>
      <description>The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex.
perform BFS (or SSSP if weighted graphs) for each vertex keep a stack of path for backtracking, i.e., traversing the graph in reverse BFS order #include &amp;lt;iostream&amp;gt; #include &amp;lt;queue&amp;gt; #include &amp;lt;stack&amp;gt; #include &amp;lt;vector&amp;gt; auto brandes(const std::vector&amp;lt;int&amp;gt;&amp;amp; rowPtr, const std::vector&amp;lt;int&amp;gt;&amp;amp; colIdx) { const auto numVertices = rowPtr.size() - 1; std::vector&amp;lt;float&amp;gt; betweenness(numVertices, 0.0f); //For each vertex s, perform a BFS to establish levels and predecessors //!</description>
    </item>
    
    <item>
      <title>Connected Components</title>
      <link>https://yuang-chen.github.io/posts/2023-09-12-connected-components/</link>
      <pubDate>Tue, 12 Sep 2023 12:32:15 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-12-connected-components/</guid>
      <description>Description Three different variants of Connected Component (CC) algorithms are implemented, and the comparisons are provided as follows:
Algorithm Time Complexity Parallelism Techniques DFS $(O(V + E))$ Poor Recursive Traversal Union-Find $(O(V + E \alpha(V)))$ Poor Path Compression, Union by Rank Shiloach-Vishkin $(O(\log^* V))$ Highly Parallel Pointer Jumping Here, $( \log^* )$ is the iterated logarithm, which is extremely slow-growing, making the algorithm very fast. $( \alpha(V) )$ is the inverse Ackermann function, practically a constant for all feasible input sizes.</description>
    </item>
    
    <item>
      <title>List</title>
      <link>https://yuang-chen.github.io/posts/2023-09-11-list/</link>
      <pubDate>Mon, 11 Sep 2023 16:33:34 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-11-list/</guid>
      <description>Description STL indeed offers std::list and std::forward_list, which are essentially double-linked list and single-linked list, respectively. std::list provides operations like push_back/front, pop_back/front with a time complexity of O(1), and supports bidirectional iterators. On the other hand, std::forward_list only allows fronting operations with O(1) and insert/erase_after for backing operations, which have a time complexity of O(n); also, it only supports forward iterators.
A valuable feature of lists is that they prohibit iterator invalidation compared to some other data structures.</description>
    </item>
    
    <item>
      <title>SSSP</title>
      <link>https://yuang-chen.github.io/posts/2023-09-09-sssp/</link>
      <pubDate>Sat, 09 Sep 2023 13:36:19 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-09-sssp/</guid>
      <description>Two variants of Single-Source Shortest Path (SSSP) have been implemented as follows. Bellman-Ford is the one that is widely implemented in parallel graph frameworks. This is because the use of a heap in Dijkstra&amp;rsquo;s algorithm can limit the parallelism of the code.
Criteria Dijkstra&amp;rsquo;s Algorithm Bellman-Ford Algorithm Type Greedy Dynamic Programming Usage Positive weights Negative weights OK Time Complexity O((V + E) * log(V)) O(V * E) Negative Cycles No Yes (Detectable) Data Structures Priority Queue None (Arrays) Initialization Start node: 0, rest ∞ Start node: 0, rest ∞ Relaxation Decrease Key Relaxation BellmanFord BellmanFord: Perform numVertices - 1 iterations of graph traversal to find the shortest path an additional iteration checks if negative cycles exist $O(|V| * |E|)$ time complexity Code #include &amp;lt;iostream&amp;gt; #include &amp;lt;queue&amp;gt; #include &amp;lt;vector&amp;gt; std::vector&amp;lt;int&amp;gt; bellmanFord(const int root, const std::vector&amp;lt;int&amp;gt;&amp;amp; rowPtr, const std::vector&amp;lt;int&amp;gt;&amp;amp; colIdx, const std::vector&amp;lt;float&amp;gt;&amp;amp; weight) { const auto numVertices = rowPtr.</description>
    </item>
    
    <item>
      <title>Deque</title>
      <link>https://yuang-chen.github.io/posts/2023-09-04-deque/</link>
      <pubDate>Mon, 04 Sep 2023 21:53:31 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-04-deque/</guid>
      <description>Description std::deque extends the interfaces of std::vector with push_front, pop_front, etc., such that elements can be inserted or removed at the end or beginning at constant time.
I&amp;rsquo;ve hardly ever incorporated std::deque in my own coding projects, and it&amp;rsquo;s a rarity in other people&amp;rsquo;s work as well.
Code std::deque is essentially a sequence of individually allocated fixed-size arrays. The real challenge lies in the bookkeeping. Four variables are relied on to keep track of data:</description>
    </item>
    
    <item>
      <title>Vector &amp; Array</title>
      <link>https://yuang-chen.github.io/posts/2023-09-02-vector-array/</link>
      <pubDate>Sat, 02 Sep 2023 10:59:59 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-02-vector-array/</guid>
      <description>Array is allocated in stack memory Vector is allocated in heap memory. Its capacity is “pre-allocated”. #include &amp;lt;iostream&amp;gt; template&amp;lt;typename T&amp;gt; class Vector { private: T* data_; size_t size_; size_t capacity_; public: Vector(): data_(nullptr), size_(0), capacity_(0) {} Vector(size_t n_): size_(n_), capacity_(n_) { data_ = new T[n_]; } ~Vector() { delete [] data_; }; T&amp;amp; operator[] (size_t index) { return data_[index]; } const T&amp;amp; operator[] (size_t index) const { return data_[index]; } size_t size() const { return size_; } void push_back(const T&amp;amp; value) { if(size_ == capacity_) { capacity_ = size_ == 0?</description>
    </item>
    
    <item>
      <title>BFS &amp; DFS</title>
      <link>https://yuang-chen.github.io/posts/2023-09-01-bfs/</link>
      <pubDate>Fri, 01 Sep 2023 11:17:51 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-09-01-bfs/</guid>
      <description>Iterative BFS Despite its apparent simplicity, this approach relies heavily on the utilization of various STL containers. std::unordered_map records the parent of each node std::unordered_set checks if a node has been visited std::queue allows the nodes be accessed in the width-first flow; using std::stack for depth-first flow std::stack reverses the parents, so the path can be printed in root-to-target order. #include &amp;lt;iostream&amp;gt; #include &amp;lt;vector&amp;gt; #include &amp;lt;unordered_map&amp;gt; #include &amp;lt;unordered_set&amp;gt; #include &amp;lt;queue&amp;gt; #include &amp;lt;stack&amp;gt; std::stack&amp;lt;int&amp;gt; BFS(const int root, const int target, const std::vector&amp;lt;int&amp;gt;&amp;amp; rowPtr, const std::vector&amp;lt;int&amp;gt;&amp;amp; colIdx) { std::unordered_map&amp;lt;int, int&amp;gt; parent; std::unordered_set&amp;lt;int&amp;gt; visited; std::queue&amp;lt;int&amp;gt; nodeQue; // std::stack&amp;lt;int&amp;gt; nodeStk for DFS std::stack&amp;lt;int&amp;gt; path; bool hasFound = false; nodeQue.</description>
    </item>
    
    <item>
      <title>Graph Algorithms</title>
      <link>https://yuang-chen.github.io/posts/2023-08-31-graph-algorithms/</link>
      <pubDate>Thu, 31 Aug 2023 18:12:09 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-08-31-graph-algorithms/</guid>
      <description>Considering myself a researcher in graph algorithms, I&amp;rsquo;ve come to the surprising realization that my grasp of these algorithms is not as solid as I thought. Hence, this blog series aims to document my exploration of various graph algorithms I&amp;rsquo;ve encountered thus far, regardless of their complexity.
The algorithms are selected from the parallel graph frameworks GAP and GBBS, focusing on their single-threaded versions to assess their complexity.
Breadth-First Search (BFS) Single-Source Shortest Paths (SSSP) Connected Components (CC) Betweenness Centrality (BC) Triangle Counting (TC) Minimum Spanning Tree (MST) Strongly Connected Components (SCC) SCAN Clustering (SCAN) Low Diameter Decomposition (LDD) Biconnected-Components (BC) Graph Coloring (COLOR) Maximal Matching (MM) Maximal Independent Set (MIS) </description>
    </item>
    
    <item>
      <title>STL Containers</title>
      <link>https://yuang-chen.github.io/posts/2023-08-30-stl-containers/</link>
      <pubDate>Wed, 30 Aug 2023 14:13:22 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-08-30-stl-containers/</guid>
      <description>In my HPC-oriented programming, my go-to choices are typically limited to arrays and vectors because of their memory efficiency. Linked lists and hash maps, being non-contiguous in memory space, rarely find their way into my toolkit. These containers draw upon many classic algorithmic designs. Lately, as I&amp;rsquo;ve been revisiting fundamental graph algorithms, I&amp;rsquo;ve also decided to take on the tasks of re-implementing these containers in a simplified illustration.
They are:</description>
    </item>
    
    <item>
      <title>Scope Guard</title>
      <link>https://yuang-chen.github.io/posts/2023-08-29-scope-guard/</link>
      <pubDate>Tue, 29 Aug 2023 10:27:54 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-08-29-scope-guard/</guid>
      <description>Background Scope Guard is a concept reminiscent of the RAII (Resource Acquisition Is Initialization) principle in C++. The idea is to manage resources (like memory, files, network sockets, etc.) using object lifetime. When the object goes out of scope, its destructor ensures that the resource is cleaned up properly. The scope guard is intended to run a given callable (like a function or lambda) when it is destroyed.
RAII (Resource Acquisition Is Initialization) is a programming idiom used in C++ where the lifetime of an object is bound to the lifetime of its scope (typically represented by a block of code wrapped in curly braces {}).</description>
    </item>
    
    <item>
      <title>Static Local Member</title>
      <link>https://yuang-chen.github.io/posts/2023-08-27-static-local-member/</link>
      <pubDate>Sun, 27 Aug 2023 11:45:15 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-08-27-static-local-member/</guid>
      <description>C++ templates are blueprints and don&amp;rsquo;t represent specific types until they are instantiated with actual types. Once instantiated, the compiler creates a specific version of that template for the provided type. For template classes, each instantiation has its own unique version of the static members, making them distinct for each type the template is instantiated with.
///////////////////// // Code Block 1 ///////////////////// #include&amp;lt;iostream&amp;gt; class ComponentBase{ protected: // component_type_count is a static variable shared by derived classes static inline size_t component_type_count = 0; }; template&amp;lt;typename T&amp;gt; class Component : public ComponentBase{ public: static size_t component_type_id(){ // ID is the static local variable for a particular type T static size_t ID = component_type_count++; return ID; } }; class A : public Component&amp;lt;A&amp;gt; {}; class B : public Component&amp;lt;B&amp;gt; {}; class C : public Component&amp;lt;C&amp;gt; {}; int main() { std::cout &amp;lt;&amp;lt; A::component_type_id() &amp;lt;&amp;lt; std::endl; // 0 std::cout &amp;lt;&amp;lt; B::component_type_id() &amp;lt;&amp;lt; std::endl; // 1 std::cout &amp;lt;&amp;lt; B::component_type_id() &amp;lt;&amp;lt; std::endl; // 1 std::cout &amp;lt;&amp;lt; A::component_type_id() &amp;lt;&amp;lt; std::endl; // 0 std::cout &amp;lt;&amp;lt; A::component_type_id() &amp;lt;&amp;lt; std::endl; // 0 std::cout &amp;lt;&amp;lt; C::component_type_id() &amp;lt;&amp;lt; std::endl; // 2 } Key Points:</description>
    </item>
    
    <item>
      <title>Formatter Specialization</title>
      <link>https://yuang-chen.github.io/posts/2023-08-25-formatter-specialization/</link>
      <pubDate>Fri, 25 Aug 2023 19:56:16 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-08-25-formatter-specialization/</guid>
      <description>We can customize the (printing) format of a given class by using the specialization of formatter.
#include &amp;lt;format&amp;gt; #include &amp;lt;iostream&amp;gt; struct Frac { int a, b; }; template &amp;lt;&amp;gt; struct std::formatter&amp;lt;Frac&amp;gt; : std::formatter&amp;lt;string_view&amp;gt; { // parse() is inherited from the base class std::formatter&amp;lt;string_view&amp;gt; // * an efficient solution: auto format(const Frac&amp;amp; frac, std::format_context&amp;amp; ctx) const { return std::format_to(ctx.out(), &amp;#34;{}/{}&amp;#34;, frac.a, frac.b); } // the same functionality as above, but inefficient due to the temporary string // auto format(const Frac&amp;amp; frac, std::format_context&amp;amp; ctx) const { // std::string temp; // std::format_to(std::back_inserter(temp), &amp;#34;{}/{}&amp;#34;, // frac.</description>
    </item>
    
    <item>
      <title>User Defined Literals</title>
      <link>https://yuang-chen.github.io/posts/2023-08-22-user-defined-literals/</link>
      <pubDate>Tue, 22 Aug 2023 23:18:37 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-08-22-user-defined-literals/</guid>
      <description>User Defined Literals (UDL) produces an object in an interesting way:
constexpr auto operator&amp;#34;&amp;#34;_f(const char* fmt, size_t) { return[=]&amp;lt;typename... T&amp;gt;(T&amp;amp;&amp;amp;... Args) { return std::vformat(fmt, std::make_format_args(std::forward&amp;lt;T&amp;gt;(Args)...)); }; } auto s = &amp;#34;example {} see {}&amp;#34;_f(&amp;#34;yep&amp;#34;, 1.1); // s = &amp;#34;example yep 1.1&amp;#34; The UDL _f has the same effect of std::format(&amp;quot;example {} see {}&amp;quot;, &amp;quot;yep&amp;quot;, 1.1). Pretty familiar (as libfmt), right?
Now, let&amp;rsquo;s break the definition of _f down:
int x = 10; double y = 3.</description>
    </item>
    
    <item>
      <title>Operator Overload</title>
      <link>https://yuang-chen.github.io/posts/2023-08-17-operator-overload/</link>
      <pubDate>Thu, 17 Aug 2023 10:36:19 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-08-17-operator-overload/</guid>
      <description>Reference: here.
The return of overloaded operator should be a reference, otherwise return-by-code will create a (temporary) rvalue that cannot be passed to the next operation f2 by non-const reference. i.e., rvalue cannot be non-const referenced.
#include &amp;lt;vector&amp;gt; #include &amp;lt;iostream&amp;gt; #include &amp;lt;functional&amp;gt; template&amp;lt;typename T, typename FN&amp;gt; requires std::invocable&amp;lt;FN, T&amp;amp;&amp;gt; // diff std::invocable? std::vector&amp;lt;T&amp;gt;&amp;amp; operator| (std::vector&amp;lt;T&amp;gt;&amp;amp; vec, FN fn) noexcept { for(auto&amp;amp; e: vec) { fn(e); } return vec; } int main(){ std::vector v{1, 2, 3}; auto f1 = [](int&amp;amp; i) {i *= i; }; std::function f2 {[](const int&amp;amp; i) {std::cout &amp;lt;&amp;lt; i &amp;lt;&amp;lt; &amp;#39; &amp;#39;; } }; v | f1 | f2; }``` </description>
    </item>
    
    <item>
      <title>Multidimensional Subscript Operator []</title>
      <link>https://yuang-chen.github.io/posts/2023-05-13-multidim-subscript-operator/</link>
      <pubDate>Sat, 13 May 2023 22:11:07 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-05-13-multidim-subscript-operator/</guid>
      <description>Finally, C++23 allows overload for the subscript operator [] to be multi-dimensional.
Before that, we normally either use:
vector of vector to form a matrix, and access it as mat[i][j] a class containing a big 1-d vector, but behaves as 2-d by overloading the operator (), e.g., mat(i,j) Now, with C++23, we advance the second option (which offers efficient memory access) with better indexing approaching as follow:
template &amp;lt;typename T, size_t R, size_t C&amp;gt; struct matrix { T&amp;amp; operator[](size_t const r, size_t const c) noexcept { return data_[r * C + c]; } T const&amp;amp; operator[](size_t const r, size_t const c) const noexcept { return data_[r * C + c]; } static constexpr size_t Rows = R; static constexpr size_t Columns = C; private: std::array&amp;lt;T, R * C&amp;gt; data_; }; int main() { matrix&amp;lt;int, 3, 2&amp;gt; m; for(size_t i = 0; i &amp;lt; m.</description>
    </item>
    
    <item>
      <title>Bitwise Op</title>
      <link>https://yuang-chen.github.io/posts/2023-05-07-bitwise-op/</link>
      <pubDate>Sun, 07 May 2023 23:33:24 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-05-07-bitwise-op/</guid>
      <description>🦥 An old note.
Bitwise vs Arithmetic running on a vector of size 2^31, bitwise operations are significantly faster than arithmetic counterparts:
seg = 64; volume = (vec_size - 1)/ seg + 1; unsigned bs = log2(seg); unsigned bv= log2(volume); unsigned bbv = volume - 1; Arithmetic: out[i] = i % volume * seg + i / volume
Bitwise: out[i] = ((i &amp;amp; bbv) &amp;lt;&amp;lt; bs) + (i &amp;gt;&amp;gt; bv)</description>
    </item>
    
    <item>
      <title>Omp Parallel Region</title>
      <link>https://yuang-chen.github.io/posts/2023-05-02-omp-parallel-region/</link>
      <pubDate>Tue, 02 May 2023 10:34:19 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-05-02-omp-parallel-region/</guid>
      <description>The results look suspicious to me&amp;hellip; But I wrote down this note many days ago 🦥. Maybe I need to evaluate it again.
Multiple Parallel Regions The cost of constructing parallel region is expensive in OpenMP. Let&amp;rsquo;s use two example for illustration:
Three loops operating on a vector of size 2^31, e.g.,
for(size_t i = 0; i &amp;lt; vec.size(); i++) vec[i] += 1, vec[i] *= 0.9, vec[i] /= 7, Case 1: a large parallel region including the three loops by omp parallel { omp for }</description>
    </item>
    
    <item>
      <title>Omp Collapse</title>
      <link>https://yuang-chen.github.io/posts/2023-05-02-omp-collapse/</link>
      <pubDate>Tue, 02 May 2023 10:28:18 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-05-02-omp-collapse/</guid>
      <description>One of my old-day notes 🦥.
Collapse of Nested Loops The collapse clause converts a prefect nested loop into a single loop then parallelize it. The condition of a perfect nested loop is that, the inner loop is tightly included by the outer loop, and no other codes lying between:
for(int i = 0 ... ) { for(int j = 0 ...) { task[i][j]; } } Such condition is hard to meet.</description>
    </item>
    
    <item>
      <title>Vector vs Array</title>
      <link>https://yuang-chen.github.io/posts/2023-05-01-vector-vs-array/</link>
      <pubDate>Mon, 01 May 2023 12:53:14 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-05-01-vector-vs-array/</guid>
      <description>Another post recycled from my earlier notes. I really don&amp;rsquo;t have motivation to improve it further 🦥.
Vector vs Array Initilization The Vector is the preferred choice for data storage in mordern C++. It is internally implemented based on the Array. However, the performance gap between the two is indeed obvious.
The Vector can be initialized via std::vector&amp;lt;T&amp;gt; vec(size). Meanwhile, an Array is initialized by T* arr = new T[size]</description>
    </item>
    
    <item>
      <title> Gather with SIMD</title>
      <link>https://yuang-chen.github.io/posts/2023-04-27-gather-simd/</link>
      <pubDate>Thu, 27 Apr 2023 13:27:50 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-04-27-gather-simd/</guid>
      <description>Writing SIMD code that works across different platforms can be a challenging task. The following log illustrates how a seemingly simple operation in C++ can quickly escalate into a significant problem.
Let&amp;rsquo;s look into the code below, where the elements of x is accessed through indices specified by idx.
normal code std::vector&amp;lt;float&amp;gt; x = /*some data*/ std::vector&amp;lt;int&amp;gt; idx = /* index */ for(auto i: idx) { auto data = x[i]; } Gather with Intel In AVX512, Gather is a specific intrinsic function to transfer data from a data array to a target vec, according to an index vec.</description>
    </item>
    
    <item>
      <title>SIMD is Pain</title>
      <link>https://yuang-chen.github.io/posts/2023-04-25-simd-pain-intro/</link>
      <pubDate>Tue, 25 Apr 2023 20:59:39 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-04-25-simd-pain-intro/</guid>
      <description>Writing code with SIMD for vectorization is painful. It deserves a blog series to record all sorts of pains I have encountered and (partially) overcome.
Indeed, once the pain of coding and debugging is finished, the program is lightning-faster. Nonetheless, I am here to complain instead of praising. Let me state why writing SIMD code is causing me emotional damage:
a single line of normal c++ code could be easily inflated to a dozen lines of code.</description>
    </item>
    
    <item>
      <title>Parallel Algorithms from Libraries</title>
      <link>https://yuang-chen.github.io/posts/2023-04-25-par-algo/</link>
      <pubDate>Tue, 25 Apr 2023 10:16:34 +0800</pubDate>
      
      <guid>https://yuang-chen.github.io/posts/2023-04-25-par-algo/</guid>
      <description>The content of this post is extracted from my previous random notes. I am too lazy to update and organize it 🦥.
C++17 new feature &amp;ndash; parallel algorithms The parallel algorithms and execution policies are introduced in C++17. Unfortuantely, according to CppReference, only GCC and Intel support these features. Clang still leaves them unimplemented.
A blog about it.
The parallel library brough by C++17 requires the usage of Intel&amp;rsquo;s oneTBB for multithreading.</description>
    </item>
    
    <item>
      <title>About Me</title>
      <link>https://yuang-chen.github.io/about/aboutme/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      
      <guid>https://yuang-chen.github.io/about/aboutme/</guid>
      <description>I am now a Postdoc at CUHK, and completed my PhD in 2.5 years from CUHK on Shenzhen campus. My CV is da.
My research focuses on optimizing sparse workloads for modern computing hardware. This involves addressing the challenge of efficiently processing sparse data structures (containing mostly empty or zero values) on hardware designed for dense, regular computations.
Parallel Graph algorithms, e,g, PageRank, BFS, Triangle Counting, etc. Sparse matrix multiplication (SpMV, SpMM, SDDMM, SpGEMM) on CPUs and GPUs Graph Neural Network &amp;amp; Sparse Large Language Models.</description>
    </item>
    
    
    
  </channel>
</rss>