Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel][RFC] Refactor the punica kernel based on Triton #5036

Merged
merged 103 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
897495f
kernel v0 done
jeejeelee May 24, 2024
e50234e
add temp_test.py
jeejeelee May 25, 2024
cdfa7c6
add unit test
jeejeelee May 27, 2024
422a65a
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee May 27, 2024
2fbb2ca
back up
jeejeelee May 28, 2024
fad4b03
start replacing bgmv
jeejeelee May 28, 2024
40d449a
backup
jeejeelee May 29, 2024
2dfeb97
optimize code
jeejeelee May 29, 2024
5e55ab8
add bgmv
jeejeelee May 29, 2024
79c07ab
modify bgmv
jeejeelee May 31, 2024
e2f56d5
resolve conflict
jeejeelee May 31, 2024
e0cb42b
optimize bgmv_shrink
jeejeelee Jun 4, 2024
64416e0
optimize bgmv_expand
jeejeelee Jun 4, 2024
891df63
add bgmv
jeejeelee Jun 4, 2024
ab85bb5
add bgmv
jeejeelee Jun 4, 2024
f99b3d2
repalcing punica completed
jeejeelee Jun 5, 2024
ef8e83a
fix bug
jeejeelee Jun 5, 2024
f75ce86
optimize kernel
jeejeelee Jun 5, 2024
c0bc06a
trigger test
jeejeelee Jun 11, 2024
a7b5370
tuning bgmv
jeejeelee Jun 13, 2024
dc72d7a
add tuning config
jeejeelee Jun 13, 2024
e7bda61
delete config
jeejeelee Jun 13, 2024
b345434
add default config
jeejeelee Jun 14, 2024
5f81613
Merge branch 'refactor-punica-kernel' of github.com:jeejeelee/vllm in…
jeejeelee Jun 14, 2024
00e0076
add default config
jeejeelee Jun 14, 2024
f4bd580
test conflict
jeejeelee Jun 14, 2024
2bc0668
trigger testing
jeejeelee Jun 18, 2024
4c5889e
delete punica
jeejeelee Jun 18, 2024
82560db
fix bug
jeejeelee Jun 19, 2024
e3ba5a5
fix unit test
jeejeelee Jun 20, 2024
348c4a4
reformat
jeejeelee Jun 20, 2024
7edefbb
update
jeejeelee Jun 20, 2024
fa27688
update
jeejeelee Jun 20, 2024
54bb7a7
update
jeejeelee Jun 21, 2024
0d9e75a
update fully sharded lora
jeejeelee Jun 21, 2024
0f71cc4
delete punica test
jeejeelee Jun 21, 2024
be32004
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jun 24, 2024
b36a92e
fix bug
jeejeelee Jun 25, 2024
5f7fa7b
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jun 25, 2024
6f06eb8
optimize unit test
jeejeelee Jun 26, 2024
3f963b4
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jun 26, 2024
a31c05a
update
jeejeelee Jun 26, 2024
28b7728
Merge branch 'refactor-punica-kernel' of https://github.com/jeejeelee…
jeejeelee Jun 26, 2024
0e7dde3
verify mem
jeejeelee Jun 26, 2024
7419d19
Trigger CI
jeejeelee Jun 26, 2024
20e8ff6
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jun 28, 2024
d9adfe1
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jun 28, 2024
5fbb2a8
update
jeejeelee Jun 29, 2024
7eebe1c
update docs
jeejeelee Jul 1, 2024
8ac0331
update docs
jeejeelee Jul 1, 2024
ea4b3cd
update docs
jeejeelee Jul 1, 2024
4a13f27
fix bug
jeejeelee Jul 1, 2024
c2998e6
sync
jeejeelee Jul 1, 2024
a10f8bc
reformat
jeejeelee Jul 1, 2024
32a0b13
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 2, 2024
3fb6016
test lazy import
jeejeelee Jul 2, 2024
ef42c46
merge
jeejeelee Jul 3, 2024
e49a5dc
merge
jeejeelee Jul 3, 2024
c201971
Merge branch 'refactor-punica-kernel' of https://github.com/jeejeelee…
jeejeelee Jul 3, 2024
66dd88f
merge main
jeejeelee Jul 3, 2024
0cedeb3
modify punica
jeejeelee Jul 4, 2024
50196ca
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 8, 2024
59d17f4
refactor sgmv metadata
jeejeelee Jul 8, 2024
4648697
fix typo
jeejeelee Jul 9, 2024
8732c76
refactor punica wrapper
jeejeelee Jul 10, 2024
a897401
merge main
jeejeelee Jul 10, 2024
7035a29
update lora unit test
jeejeelee Jul 11, 2024
391d761
reduce triton overhead
jeejeelee Jul 12, 2024
1dc8ec0
delete libentry
jeejeelee Jul 12, 2024
630bcab
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 15, 2024
9585adb
delete punica_c code
jeejeelee Jul 15, 2024
01da449
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 16, 2024
d1ef5a0
add libentry
jeejeelee Jul 16, 2024
b19ee95
format
jeejeelee Jul 16, 2024
68622d1
optimize no lora step
jeejeelee Jul 16, 2024
e7b4a4e
move libentry location
jeejeelee Jul 17, 2024
56a5ef8
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 17, 2024
008a9d7
test gemma lora
jeejeelee Jul 17, 2024
5e11209
cleanup code
jeejeelee Jul 18, 2024
2e9a360
sync main branch
jeejeelee Jul 18, 2024
0c010fd
Verify libentry decorator for punica and sample kernels
jeejeelee Jul 19, 2024
d24a329
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 19, 2024
445992a
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 22, 2024
1a23abc
clean up code
jeejeelee Jul 22, 2024
c1a0cd5
modify libentry code
jeejeelee Jul 22, 2024
4513dcf
fix bug
jeejeelee Jul 22, 2024
c876e39
modify libentry code and cleanup code
jeejeelee Jul 22, 2024
b02bce3
add a comment to libentry code
jeejeelee Jul 22, 2024
f1fdcd2
sync code
jeejeelee Jul 23, 2024
89e96eb
test lora CI
jeejeelee Jul 23, 2024
a1f5146
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 24, 2024
7706ce9
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 25, 2024
1f4a472
fix typo
jeejeelee Jul 25, 2024
377847a
modify test
jeejeelee Jul 25, 2024
cd1fb05
Trigger CI
jeejeelee Jul 25, 2024
9ac909e
optimize bgmv_exapnd and enhance punica unit test
jeejeelee Jul 26, 2024
80ca2b1
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 26, 2024
9a4f147
fix docstring bug
jeejeelee Jul 28, 2024
6620ffb
modify max batches
jeejeelee Jul 28, 2024
5f4a73a
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 28, 2024
50be9bb
sync
jeejeelee Jul 29, 2024
a6d9e46
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 30, 2024
37c3cbd
Merge branch 'vllm-project:main' into refactor-punica-kernel
jeejeelee Jul 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/scripts/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ $python_executable -m pip install -r requirements-cuda.txt

# Limit the number of parallel jobs to avoid OOM
export MAX_JOBS=1
# Make sure punica is built for the release (for LoRA)
export VLLM_INSTALL_PUNICA_KERNELS=1
# Make sure release wheels are built for the following architectures
export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"
# Build
Expand Down
62 changes: 0 additions & 62 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -222,61 +222,7 @@ define_gpu_extension_target(
USE_SABI 3
WITH_SOABI)

#
# _punica_C extension
#

set(VLLM_PUNICA_EXT_SRC
"csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu"
"csrc/punica/bgmv/bgmv_bf16_fp32_bf16.cu"
"csrc/punica/bgmv/bgmv_fp16_fp16_fp16.cu"
"csrc/punica/bgmv/bgmv_fp16_fp32_fp16.cu"
"csrc/punica/bgmv/bgmv_fp32_bf16_bf16.cu"
"csrc/punica/bgmv/bgmv_fp32_fp16_fp16.cu"
"csrc/punica/punica_ops.cu"
"csrc/punica/torch_bindings.cpp")

#
# Copy GPU compilation flags+update for punica
#
set(VLLM_PUNICA_GPU_FLAGS ${VLLM_GPU_FLAGS})
list(REMOVE_ITEM VLLM_PUNICA_GPU_FLAGS
"-D__CUDA_NO_HALF_OPERATORS__"
"-D__CUDA_NO_HALF_CONVERSIONS__"
"-D__CUDA_NO_BFLOAT16_CONVERSIONS__"
"-D__CUDA_NO_HALF2_OPERATORS__")

#
# Filter out CUDA architectures < 8.0 for punica.
#
if (${VLLM_GPU_LANG} STREQUAL "CUDA")
set(VLLM_PUNICA_GPU_ARCHES)
foreach(ARCH ${VLLM_GPU_ARCHES})
string_to_ver(CODE_VER ${ARCH})
if (CODE_VER GREATER_EQUAL 8.0)
list(APPEND VLLM_PUNICA_GPU_ARCHES ${ARCH})
endif()
endforeach()
message(STATUS "Punica target arches: ${VLLM_PUNICA_GPU_ARCHES}")
elseif(${VLLM_GPU_LANG} STREQUAL "HIP")
set(VLLM_PUNICA_GPU_ARCHES ${VLLM_GPU_ARCHES})
message(STATUS "Punica target arches: ${VLLM_PUNICA_GPU_ARCHES}")
endif()

if (VLLM_PUNICA_GPU_ARCHES)
define_gpu_extension_target(
_punica_C
DESTINATION vllm
LANGUAGE ${VLLM_GPU_LANG}
SOURCES ${VLLM_PUNICA_EXT_SRC}
COMPILE_FLAGS ${VLLM_PUNICA_GPU_FLAGS}
ARCHITECTURES ${VLLM_PUNICA_GPU_ARCHES}
USE_SABI 3
WITH_SOABI)
else()
message(WARNING "Unable to create _punica_C target because none of the "
"requested architectures (${VLLM_GPU_ARCHES}) are supported, i.e. >= 8.0")
endif()

#
# Add the `default` target which detects which extensions should be
Expand All @@ -300,12 +246,4 @@ if(VLLM_GPU_LANG STREQUAL "CUDA" OR VLLM_GPU_LANG STREQUAL "HIP")
message(STATUS "Enabling moe extension.")
add_dependencies(default _moe_C)

# Enable punica if -DVLLM_INSTALL_PUNICA_KERNELS=ON or
# VLLM_INSTALL_PUNICA_KERNELS is set in the environment and
# there are supported target arches.
if (VLLM_PUNICA_GPU_ARCHES AND
(ENV{VLLM_INSTALL_PUNICA_KERNELS} OR VLLM_INSTALL_PUNICA_KERNELS))
message(STATUS "Enabling punica extension.")
add_dependencies(default _punica_C)
endif()
endif()
2 changes: 0 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,6 @@ ENV MAX_JOBS=${max_jobs}
# number of threads used by nvcc
ARG nvcc_threads=8
ENV NVCC_THREADS=$nvcc_threads
# make sure punica kernels are built (for LoRA)
ENV VLLM_INSTALL_PUNICA_KERNELS=1

ARG buildkite_commit
ENV BUILDKITE_COMMIT=${buildkite_commit}
Expand Down
3 changes: 1 addition & 2 deletions Dockerfile.rocm
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,7 @@ COPY . .
RUN --mount=type=cache,target=/root/.cache/pip \
python3 -m pip install --upgrade numba scipy huggingface-hub[cli]

# Make sure punica kernels are built (for LoRA)
ENV VLLM_INSTALL_PUNICA_KERNELS=1

# Workaround for ray >= 2.10.0
ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1
# Silences the HF Tokenizers warning
Expand Down
217 changes: 0 additions & 217 deletions csrc/punica/LICENSE

This file was deleted.

5 changes: 0 additions & 5 deletions csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu

This file was deleted.

5 changes: 0 additions & 5 deletions csrc/punica/bgmv/bgmv_bf16_fp32_bf16.cu

This file was deleted.

Loading
Loading