consistently hitting FileNotFoundError for triton cache kernel stage

```
export CUDA_VISIBLE_DEVICES="0,1,2,3"
python -m sglang.launch_server --model-path lmms-lab/llava-next-72b --tokenizer-path lmms-lab/llavanext-qwen-tokenizer --port=30010 --host="0.0.0.0" --tp-size=4 --random-seed=1234 --context-length=32768
```

always hit this:
```
Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_rpc.py", line 175, in exposed_step
    self.forward_step()
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_rpc.py", line 204, in forward_step
    self.forward_decode_batch(self.running_batch)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_rpc.py", line 534, in forward_decode_batch
    ) = self.model_runner.forward(batch, ForwardMode.DECODE)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_runner.py", line 406, in forward
    return self.forward_decode(batch)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_runner.py", line 375, in forward_decode
    return self.model.forward(
  File "/home/ubuntu/sglang/python/sglang/srt/models/llava_qwen.py", line 247, in forward
    return self.language_model(input_ids, positions, input_metadata)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/models/qwen2.py", line 269, in forward
    hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/models/qwen2.py", line 239, in forward
    hidden_states, residual = layer(
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/models/qwen2.py", line 191, in forward
    hidden_states = self.self_attn(
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/models/qwen2.py", line 140, in forward
    attn_output = self.attn(q, k, v, input_metadata)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/layers/radix_attention.py", line 125, in forward
    return self.decode_forward(q, k, v, input_metadata)
  File "/home/ubuntu/sglang/python/sglang/srt/layers/radix_attention.py", line 79, in decode_forward_triton
    token_attention_fwd(
  File "/home/ubuntu/sglang/python/sglang/srt/layers/token_attention.py", line 339, in token_attention_fwd
    _token_softmax_reducev_fwd(
  File "/home/ubuntu/sglang/python/sglang/srt/layers/token_attention.py", line 284, in _token_softmax_reducev_fwd
    _fwd_kernel_stage2[grid](
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
    self.cache[device][key] = compile(
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/triton/compiler/compiler.py", line 202, in compile
    return CompiledKernel(so_path, metadata_group.get(metadata_filename))
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/triton/compiler/compiler.py", line 230, in __init__
    self.asm = {
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/triton/compiler/compiler.py", line 231, in <dictcomp>
    file.suffix[1:]: file.read_bytes() if file.suffix[1:] == driver.binary_ext else file.read_text()
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/pathlib.py", line 1134, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/.triton/cache/8027c157df5f85e0655fe2f49379130c/_fwd_kernel_stage2.json.tmp.pid_4180629_32310'

INFO:     127.0.0.1:42948 - "GET /get_model_info HTTP/1.1" 200 OK
```

As you see a curl for model info still works, but I think there is problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

consistently hitting FileNotFoundError for triton cache kernel stage #472

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

consistently hitting FileNotFoundError for triton cache kernel stage #472

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions