Descend

Descend is a safe systems programming language that adapts and extends Rust's type system for massively parallel computing on AI accelerators. Unlike unsafe languages like CUDA and OpenCL that rely on raw pointers and manual synchronization, Descend statically prevents data races, deadlocks, and memory safety violations through extended borrow checking, ownership tracking, and lifetime analysis. Originally presented in the paper "Descend: A Safe GPU Systems Programming Language" targeting NVIDIA GPUs via CUDA, this implementation extends Descend to target Huawei Ascend NPUs through MLIR and AscendNPU-IR as the primary compilation target.

Key Safety Features:

Extended Borrow Checking: Prevents data races by tracking unique (uniq) and shared (shrd) references across thousands of parallel threads
Memory Views: Safe parallel access patterns that replace raw pointer indexing, statically verified to be race-free
Execution Resource Tracking: Types enforce that memory is only accessed in correct execution contexts (cpu.thread, gpu.grid, gpu.block, gpu.thread)
Explicit Memory Spaces: References track physical memory locations (cpu.mem, gpu.global, gpu.shared) preventing invalid cross-device accesses
Safe Synchronization: The type system enforces correct placement and usage of synchronization primitives

Design Philosophy:

Imperative Systems Programming: Low-level control with a safety net, not high-level functional abstractions
Hierarchical Scheduling: Explicitly schedule computations over GPU's execution hierarchy (grid → blocks → threads)
Zero-Cost Safety: Compile-time guarantees without runtime overhead
Heterogeneous Computing: Holistic programming model spanning CPU and NPU with physically separated memories reflected in the type system

Primary Target: Huawei Ascend NPU

This implementation primarily targets Huawei Ascend AI Processors (NPUs) through AscendNPU-IR, an open-source MLIR-based intermediate representation developed by Huawei for compiling and optimizing machine learning models on Ascend hardware. The MLIR backend is the default and most complete compilation target.

What is AscendNPU-IR?

AscendNPU-IR is Huawei's compiler infrastructure for Ascend AI processors, bridging the gap between high-level machine learning frameworks and low-level NPU instructions. It provides:

Multi-Level IR Architecture: Hierarchical representation enabling progressive lowering from high-level ML operators to hardware-specific instructions
MLIR-Based Foundation: Built on MLIR (Multi-Level Intermediate Representation), a modular and extensible compiler framework
Open-Source Compiler Infrastructure: Publicly available toolchain for the Ascend AI ecosystem
Framework Integration: Designed to work seamlessly with TensorFlow, PyTorch, and other ML frameworks

Ascend-Specific Dialects:

AscendNPU-IR defines custom MLIR dialects tailored to Ascend NPU capabilities:

HIVM (High-Level IR for Vector Machines):
- Vectorized operations optimized for Ascend's architecture
- DMA operations for efficient memory transfers
- Synchronization primitives for parallel execution
HACC (High-Level Accelerator Compiler):
- Hardware-specific optimizations for Ascend NPUs
- Core computational operations and instruction generation
- Low-level code generation and scheduling
HFusion:
- Operator fusion to reduce memory traffic
- Performance optimization through combined operations
Supporting Dialects: MathExt, MemRefExt, annotation, and symbol management

Why MLIR for Ascend NPUs?

Standardized IR: Common platform for integrating diverse ML frameworks with Ascend hardware
Progressive Lowering: Step-by-step transformation from high-level abstractions to hardware instructions, optimizing at each level
Mature Optimization Infrastructure: Leverages MLIR's robust optimization passes for memory layout, fusion, and performance tuning
Cross-Platform Potential: MLIR foundation enables potential adaptation to other AI accelerators beyond Ascend

Descend + AscendNPU-IR Integration:

The MLIR backend maps Descend's execution contexts (gpu.grid/gpu.block/gpu.thread) and memory hierarchies (gpu.global/gpu.local) to corresponding Ascend NPU constructs through AscendNPU-IR's HIVM dialect. This integration:

Preserves Descend's compile-time safety guarantees (race freedom, memory safety, synchronization correctness)
Generates efficient code optimized for Ascend NPU hardware
Leverages Ascend-specific optimizations through HACC and HFusion dialects
Enables deployment of safe, high-performance parallel programs on Huawei's AI infrastructure

Example: Simple Addition

Descend:

fn main() -[t: cpu.thread]-> i32 {
    let a = 10;
    let b = 32;
    a + b
}

Generated MLIR code (default backend):

module {
  func.func @main() -> i32 {
    %c10_i32 = arith.constant 10 : i32
    %c32_i32 = arith.constant 32 : i32
    %0 = arith.addi %c10_i32, %c32_i32 : i32
    return %0 : i32
  }
}

Example: GPU Memory Operations

Descend:

fn add<n: nat, r: prv>(
    a: &r shrd gpu.global [i16; 16],
    b: &r shrd gpu.global [i16; 16],
    c: &r uniq gpu.global [i16; 16]
) -[grid: gpu.grid<X<1>, X<16>>]-> () {
    // Vector addition with GPU memory spaces
    ()
}

Generated MLIR with HIVM dialect:

module {
  func.func @add(%arg0: memref<16xi16, #hivm.address_space<gm>>, 
                 %arg1: memref<16xi16, #hivm.address_space<gm>>, 
                 %arg2: memref<16xi16, #hivm.address_space<gm>>) 
                 attributes {hacc.entry, hacc.function_kind = #hacc.function_kind<DEVICE>} {
    return
  }
}

Setup

Required:

clang-format: Must be found in the system path.
rustc and cargo
git
For MLIR backend: MLIR installation with TableGen (included in the AscendNPU-IR submodule)

Clone the repository and compile:

git clone git@github.com:descend-lang/descend.git --recurse-submodules
cargo build
cargo test

Building on Different Machines

The MLIR backend uses custom dialect definitions that require absolute paths for TableGen include directories. If you encounter build errors about missing include files when building on a different machine or in a different directory:

Automatic Fix: Run the provided script to update paths:
```
./scripts/update-dialect-paths.sh
```
Manual Fix: Edit src/codegen/mlir/dialects.rs and update all include_directories paths to:
```
<YOUR_PROJECT_ROOT>/AscendNPU-IR/bishengir/include
```

Why is this needed? The melior::dialect! macro that generates Rust bindings from TableGen files only accepts string literals for include directories and cannot use Rust's env!() or concat!() macros for dynamic path resolution. The CI pipeline automatically runs the update script to ensure correct paths in different environments.

Usage

The Descend compiler can be run using cargo. To see all available options:

cargo run -- -h

Basic Compilation

Compile a Descend source file to CUDA (default backend):

cargo run -- path/to/your_file.desc

This will generate your_file.out in the current directory.

Backend Selection

Descend supports multiple backends:

MLIR: Generates MLIR IR targeting Ascend NPUs via AscendNPU-IR (default, recommended)
CUDA: Generates CUDA C++ code for NVIDIA GPUs (experimental, limited features)

Compile to a specific backend:

cargo run -- path/to/your_file.desc mlir    # Default MLIR backend
cargo run -- path/to/your_file.desc cuda    # Experimental CUDA backend

Print AST

Print the Abstract Syntax Tree using -p or --print-ast:

cargo run -- path/to/your_file.desc -p

This will generate both your_file.out and your_file.ast files.

Examples

Compile a Descend example with type inference to CUDA:

cargo run -- descend-examples/infer/scale_vec.desc

Compiler Backends

Descend features a frontend-agnostic architecture that supports multiple compilation targets:

MLIR Backend (Primary)

The MLIR backend targets Huawei Ascend NPUs through AscendNPU-IR:

Target: Huawei Ascend AI processors (NPUs)
Output: MLIR intermediate representation with HIVM/HACC dialects
Status: ✅ Production Ready - Complete implementation with comprehensive testing
Location: src/codegen/mlir/
Features: Full type system, HIVM address spaces, HACC device functions, comprehensive test suite

CUDA Backend (Experimental)

The CUDA backend generates C++ code for NVIDIA GPUs:

Target: NVIDIA CUDA-capable GPUs
Output: CUDA C++ code with runtime library
Status: ⚠️ Experimental - Basic functionality, many features incomplete
Location: src/codegen/cuda/
Limitations: Limited feature support, many TODO items, not recommended for production use

MLIR Backend Architecture

AscendNPU-IR Integration

AscendNPU-IR is Huawei's MLIR-based intermediate representation for Ascend AI processors
Provides multi-level abstractions for compute, data movement, and synchronization
Located at AscendNPU-IR/ (submodule)

Key Components

Type Conversion (to_mlir.rs)
- Descend types → MLIR types
- Scalars → builtin types (i32, f64, i1, etc.)
- Arrays → memref types
- Tuples → MLIR tuple types
MLIR Builder (builder.rs)
- Constructs MLIR IR from Descend AST
- Function declarations and definitions
- Expression and statement translation
- MLIR context and module management
AscendNPU-IR Dialects (dialects.rs)
- HIVM (Huawei Intermediate Virtual Machine): Core compute dialect
  - DMA operations for memory transfers
  - Synchronization primitives
  - Vector operations
  - Macro operations
- annotation: Metadata and optimization hints
- symbol: Symbol management
- Additional dialects available: HACC, HFusion, MathExt, MemRefExt

Implementation Status

✅ Phase 1: Basic MLIR Generation (Completed)

Type system conversion
Function signature generation
Control flow (if/else, loops)
Memory operations (alloc, load, store)
HIVM address space mapping
HACC device function attributes

✅ Phase 2: Ascend-Specific Lowering (Completed)

Map execution contexts (gpu.grid/gpu.block/gpu.thread) to HIVM parallel constructs
Map memory hierarchies (gpu.global → HIVM global, gpu.local → HIVM shared)
HIVM dialect integration with proper address spaces
HACC entry point and device function generation
Comprehensive test suite (14 passing tests)

🔄 Phase 3: Optimization and Integration (In Progress)

Basic hardware-specific optimizations via HACC dialect
Advanced operator fusion via HFusion
Pipeline optimization and memory layout tuning
Hardware testing and benchmarking

Advantages of MLIR Approach

Leverages mature MLIR infrastructure for optimizations
Enables gradual lowering through multiple passes
Provides standardized interfaces for AI framework integration
Supports cross-platform retargeting to other MLIR-supported accelerators
Better integration with compiler toolchains

Testing

✅ Unit tests: src/codegen/mlir/to_mlir.rs - Comprehensive type conversion tests
✅ Integration tests: tests/mlir/ - 14 passing tests covering core language features
✅ Example programs: examples/core/ - Working examples demonstrating MLIR generation
✅ Test coverage: Constants, arithmetic, control flow, memory operations, GPU memory spaces

Modules and Directories

ast

Data types and representation of the Abstract Syntax Tree: expressions and types
Visitors for convenient tree tarversals
Span tracks the provenance of source code in the AST

parser

parse a string into AST
based on Rust PEG

ty_check

typing rules restrict how syntactic constructs can be used
type inference
borrow checking and lifetime checking are part of type checking/inference
defines contexts that are tracked during type checking
pre-declared function signatures for views and built-in functions, such as exec

codegen

mlir/: MLIR backend (primary) - Complete type conversion to MLIR, MLIR builder, AscendNPU-IR dialect bindings for Ascend NPUs
cuda/: CUDA backend (experimental) - Data types for CUDA AST, translates Descend AST to CUDA AST, printing of the CUDA AST to C++ code
Supports multiple compilation targets through a unified frontend

cuda-examples/

Contains handwritte or generated CUDA programs
Contains descend.cuh; the header file which is required in order to compile Descend programs, that were translated to CUDA, with nvcc (contains for example the implementation of exec)

descend-examples/

Example programs written in Descend
Many programs exist twice, once in with_tys and once in infer/
with_tys: programs have fully-annotated types
infer: types in programs are mainly inferred

examples/

Additional example programs for testing various backends
core/: Core language examples for MLIR backend development

AscendNPU-IR/

Submodule containing Huawei's AscendNPU-IR MLIR dialect definitions
HIVM, HACC, HFusion, and other Ascend-specific dialects
Build tools and documentation for Ascend NPU compilation
Integration tests and E2E use cases

Name		Name	Last commit message	Last commit date
Latest commit History 419 Commits
.cargo		.cargo
.github/workflows		.github/workflows
.vscode		.vscode
cuda-examples		cuda-examples
descend_derive		descend_derive
examples		examples
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

License

trusted-programming/descend

Folders and files

Latest commit

History

Repository files navigation