minor: support flashinfer nightly #2295

zhyncs · 2024-12-01T09:18:46Z

Motivation

Currently, it can only be triggered manually.

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

zhyncs · 2024-12-01T09:26:56Z

ref #2179

scripts/ci_install_dependency.sh

zhyncs · 2024-12-01T10:38:15Z

ref https://github.com/sgl-project/sglang/actions/runs/12104687316
nightly flashinfer test cc @yzh119

zhyncs · 2024-12-01T10:39:34Z

scripts/ci_install_dependency.sh

-"""
-Install the dependency in CI.
-"""
+# Install the dependency in CI.


Comments in bash use #

zhyncs · 2024-12-01T10:40:07Z

scripts/ci_install_dependency.sh


-./killall_sglang.sh
+# Use repo from environment variable, passed from GitHub Actions
+FLASHINFER_REPO="${FLASHINFER_REPO:-https://flashinfer.ai/whl/cu121/torch2.4}"


Index URL does not need to include the flashinfer directory

zhyncs · 2024-12-01T10:40:33Z

scripts/ci_install_dependency.sh

+# Use repo from environment variable, passed from GitHub Actions
+FLASHINFER_REPO="${FLASHINFER_REPO:-https://flashinfer.ai/whl/cu121/torch2.4}"
+
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"


Resolve the issue of not finding the execution script

zhyncs · 2024-12-01T10:40:59Z

.github/workflows/pr-test.yml

+        required: true
+        type: choice
+        default: 'release'
+        options:


Use the choice method

zhyncs · 2024-12-01T10:44:14Z

/usr/local/lib/python3.10/dist-packages/websockets/legacy/__init__.py:6: DeprecationWarning: websockets.legacy is deprecated; see https://websockets.readthedocs.io/en/stable/howto/upgrade.html for upgrade instructions
  warnings.warn(  # deprecated in 14.0 - 2024-11-09
/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/websockets/websockets_impl.py:16: DeprecationWarning: websockets.server.WebSocketServerProtocol is deprecated
  from websockets.server import WebSocketServerProtocol
[2024-12-01 10:39:32 TP0] TpModelWorkerClient hit an exception: Traceback (most recent call last):
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 99, in forward_thread_func
    self.forward_thread_func_()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 130, in forward_thread_func_
    logits_output, next_token_ids = self.worker.forward_batch_generation(
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/managers/tp_worker.py", line 149, in forward_batch_generation
    logits_output = self.model_runner.forward(forward_batch)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 664, in forward
    return self.forward_extend(forward_batch)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/model_executor/model_runner.py", line 633, in forward_extend
    return self.model.forward(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/models/llama.py", line 336, in forward
    hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/models/llama.py", line [28](https://github.com/sgl-project/sglang/actions/runs/12104687316/job/33748388459#step:4:29)8, in forward
    hidden_states, residual = layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/models/llama.py", line 237, in forward
    hidden_states = self.self_attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/models/llama.py", line 174, in forward
    attn_output = self.attn(q, k, v, forward_batch)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 17[36](https://github.com/sgl-project/sglang/actions/runs/12104687316/job/33748388459#step:4:37), in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/layers/radix_attention.py", line 58, in forward
    return forward_batch.attn_backend.forward(q, k, v, self, forward_batch)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/layers/attention/__init__.py", line 60, in forward
    return self.forward_extend(q, k, v, layer, forward_batch)
  File "/actions-runner/_work/sglang/sglang/python/sglang/srt/layers/attention/flashinfer_backend.py", line 2[42](https://github.com/sgl-project/sglang/actions/runs/12104687316/job/33748388459#step:4:43), in forward_extend
    o = prefill_wrapper_paged.forward(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 1144, in forward
    return self.run(q, paged_kv_cache, k_scale=k_scale, v_scale=v_scale)
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/prefill.py", line 1211, in run
    _check_cached_qkv_data_type(
  File "/usr/local/lib/python3.10/dist-packages/flashinfer/utils.py", line 199, in _check_cached_qkv_data_type
    raise ValueError(
ValueError: The dtype of q torch.bfloat16 does not match the q_data_type torch.float16 specified in plan function.

nightly version issue cc @yzh119

zhyncs · 2024-12-01T10:51:58Z

This PR currently provides the option to manually enable nightly flashinfer, with the default still being the release version. There is an issue with dtype inconsistency in nightly flashinfer that needs to be fixed by flashinfer. This PR is safe to merge. cc @merrymercy @yzh119

yzh119 · 2024-12-01T14:05:09Z

For bf16 models, the data type needs to be specified explicitly in plan functions.

zhyncs requested a review from merrymercy December 1, 2024 09:18

zhyncs marked this pull request as draft December 1, 2024 09:26

zhyncs force-pushed the zhyncs/nightly branch from abcc0da to e495d1b Compare December 1, 2024 10:08

zhyncs marked this pull request as ready for review December 1, 2024 10:09

merrymercy reviewed Dec 1, 2024

View reviewed changes

scripts/ci_install_dependency.sh Show resolved Hide resolved

zhyncs force-pushed the zhyncs/nightly branch from 476a6a9 to c248778 Compare December 1, 2024 10:13

upd

db93a31

zhyncs force-pushed the zhyncs/nightly branch from adafbbd to db93a31 Compare December 1, 2024 10:19

fix

291bd54

zhyncs force-pushed the zhyncs/nightly branch from 7d7b20d to 291bd54 Compare December 1, 2024 10:32

zhyncs requested a review from merrymercy December 1, 2024 10:38

zhyncs commented Dec 1, 2024

View reviewed changes

zhyncs merged commit fc78640 into main Dec 1, 2024
6 of 30 checks passed

zhyncs deleted the zhyncs/nightly branch December 1, 2024 10:55

zhyncs mentioned this pull request Dec 2, 2024

[Feature] Specify dtype at begin_forward for FlashInfer > 0.1.6 #2313

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minor: support flashinfer nightly #2295

minor: support flashinfer nightly #2295

zhyncs commented Dec 1, 2024

zhyncs commented Dec 1, 2024

zhyncs commented Dec 1, 2024

zhyncs Dec 1, 2024

zhyncs Dec 1, 2024

zhyncs Dec 1, 2024

zhyncs Dec 1, 2024

zhyncs commented Dec 1, 2024

zhyncs commented Dec 1, 2024

yzh119 commented Dec 1, 2024

minor: support flashinfer nightly #2295

minor: support flashinfer nightly #2295

Conversation

zhyncs commented Dec 1, 2024

Motivation

Modifications

Checklist

zhyncs commented Dec 1, 2024

zhyncs commented Dec 1, 2024

zhyncs Dec 1, 2024

Choose a reason for hiding this comment

zhyncs Dec 1, 2024

Choose a reason for hiding this comment

zhyncs Dec 1, 2024

Choose a reason for hiding this comment

zhyncs Dec 1, 2024

Choose a reason for hiding this comment

zhyncs commented Dec 1, 2024

zhyncs commented Dec 1, 2024

yzh119 commented Dec 1, 2024