Skip to content

yiakwy-xpu-ml-framework-team/Tooklkit-remote-pdb-for-pytorch-distributed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Remote PDB used in pytorch distributed environment (Megatron-LM).

  • Free software : MIT license

installation

Prerequsite:

sudo apt-get update && apt-get install socat rlwrap

See examples in [CUDA12 dockerfile](https://github.com/yiakwy-xpu-ml-framework-team/dockerhub/blob/main/cuda/Dockerfile.megatron-lm.ubuntu-22.04)

And simply include the project into dependencies. Pipy hosting is pending.

Usage

The console will print the command for you to connect any Pytorch Rank. Just copy and execute it.

About

Debugging torch distributed program

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages