Skip to content

tracing

Mark Gates edited this page Jul 13, 2023 · 1 revision

Named regions using the NVIDIA NVTX library

Add the following flags to make.inc:

LIBS += -lnvToolsExt
NVCCFLAGS = -lineinfo

Call nvtxRangePush/Pop functions in slate/include/slate/internal/Trace.hh:

#include <nvToolsExt.h>
class Block {
public:
    Block(const char* name)
        : event_(name)
    { nvtxRangePush(name); }

    ~Block() { Trace::insert(event_); nvtxRangePop(); }
private:
    Event event_;
};

Start and stop profiler in the driver routine:

...
#include <cuda_profiler_api.h>
int main(){
  ...
  cudaProfilerStart();
  {
    slate::trace::Block trace_block(std::string("gemm").c_str());
    slate::gemm(alpha, A, B, beta, C);
  }
  cudaProfilerStop();
}

NVIDIA Visual Profiler (NVVP)

The NVVP can be used to view traces.

Profile the code using the command line tool nvprof found in the CUDA development kit:

nvprof -f -o ../dgeqrf-dim1000-nb1000-ib1000.nvvp --profile-from-start off ./test/tester  --origin d --target d --type d --lookahead 1 --dim 1000 --ib 1000 --ref n --check y --nb 1000 --repeat 1 geqrf

nvprof will generate an .nvvp file. Open the this file using NVVP.

NVIDIA Nsight Systems

The NVIDIA Nsight Systems can be used to view traces instead of older NVVP.

You can download Nsight Systems from: https://developer.nvidia.com/gameworksdownload#?tx=$gameworks,developer_tools

Profile the code using the command line tool nsys found in the CUDA development kit:

nsys profile --stats=true --gpu-metrics-device=all ./tester  --origin h --target d --type d --dim 1000 --ref n --check y --nb 200 --repeat 1 geqrf

The options of nsys may depend on the CUDA version.

nsys will generate a .qdrep file. Open this file using the Nsight Systems. Newer versions of CUDA and nsys produce .nsys-rep files instead of .qdrep files. These .nsys-rep files require newer Nsight System.

Clone this wiki locally