Skip to content

Commit ed16d0f

Browse files
[Doc] mention fpdb for multiprocess breakpoints (#24452)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
1 parent 0cdd213 commit ed16d0f

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

docs/usage/troubleshooting.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,34 @@ If other strategies don't solve the problem, it's likely that the vLLM instance
4040
- `export NCCL_DEBUG=TRACE` to turn on more logging for NCCL.
4141
- `export VLLM_TRACE_FUNCTION=1` to record all function calls for inspection in the log files to tell which function crashes or hangs. Do not use this flag unless absolutely needed for debugging, it will cause significant delays in startup time.
4242

43+
## Breakpoints
44+
45+
Setting normal `pdb` breakpoints may not work in vLLM's codebase if they are executed in a subprocess. You will experience something like:
46+
47+
``` text
48+
File "/usr/local/uv/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/bdb.py", line 100, in trace_dispatch
49+
return self.dispatch_line(frame)
50+
^^^^^^^^^^^^^^^^^^^^^^^^^
51+
File "/usr/local/uv/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/bdb.py", line 125, in dispatch_line
52+
if self.quitting: raise BdbQuit
53+
^^^^^^^^^^^^^
54+
bdb.BdbQuit
55+
```
56+
57+
One solution is using [forked-pdb](https://github.com/Lightning-AI/forked-pdb). Install with `pip install fpdb` and set a breakpoint with something like:
58+
59+
``` python
60+
__import__('fpdb').ForkedPdb().set_trace()
61+
```
62+
63+
Another option is to disable multiprocessing entirely, with the `VLLM_ENABLE_V1_MULTIPROCESSING` environment variable.
64+
This keeps the scheduler in the same process, so you can use stock `pdb` breakpoints:
65+
66+
``` python
67+
import os
68+
os.environ["VLLM_ENABLE_V1_MULTIPROCESSING"] = "0"
69+
```
70+
4371
## Incorrect network setup
4472

4573
The vLLM instance cannot get the correct IP address if you have a complicated network config. You can find a log such as `DEBUG 06-10 21:32:17 parallel_state.py:88] world_size=8 rank=0 local_rank=0 distributed_init_method=tcp://xxx.xxx.xxx.xxx:54641 backend=nccl` and the IP address should be the correct one.

0 commit comments

Comments
 (0)