Name		Name	Last commit message	Last commit date
parent directory ..
tests		tests
CMakeLists.txt		CMakeLists.txt
README.md		README.md
cute_vector_copy.cu		cute_vector_copy.cu
cute_vector_copy.hpp		cute_vector_copy.hpp
cute_vector_copy_vectorized.cu		cute_vector_copy_vectorized.cu

README.md

CuTe Vector Copy

Introduction

These examples demonstrate the implementation of vector copy kernels using the CuTe. The general vector copy kernel does boundary checks and can be used for any matrix size. The vector copy vectorized kernel assumes the vector size is a multiple of certain size depending on the data type.

Usages

Run Unit Tests

$ ctest --test-dir build/ --tests-regex "TestAllVectorCopy.*" --verbose

The following tests passed:
        TestAllVectorCopy
        TestAllVectorCopyVectorized

100% tests passed, 0 tests failed out of 2

Run Performance Measurement

$ ctest --test-dir build/ --tests-regex "ProfileAllVectorCopy.*" --verbose

The following tables show the performance measurements of copying 1 GB floating point values using the vector copy kernels on NVIDIA GeForce RTX 3090.

Kernel Name	Latency (ms)	Effective Bandwidth (GB/s)	Peak Bandwidth Percentage (%)
Vector Copy	2.81923	709.413	75.7842
Vector Copy Vectorized	2.83133	706.382	75.4604

Run Nsight Compute Profiling

for file in build/examples/cute_vector_copy/tests/profile_*; do
    filename=$(basename -- "$file")
    ncu --set full -f -o ncu_reports/"$filename" "$file"
done

Run Compute Sanitizer

for file in build/examples/cute_vector_copy/tests/test_*; do
    compute-sanitizer --leak-check full "$file"
done

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cute_vector_copy

cute_vector_copy

README.md

CuTe Vector Copy

Introduction

Usages

Run Unit Tests

Run Performance Measurement

Run Nsight Compute Profiling

Run Compute Sanitizer

References

Files

cute_vector_copy

Directory actions

More options

Directory actions

More options

Latest commit

History

cute_vector_copy

Folders and files

parent directory

README.md

CuTe Vector Copy

Introduction

Usages

Run Unit Tests

Run Performance Measurement

Run Nsight Compute Profiling

Run Compute Sanitizer

References