Skip to content

Conversation

@jeejeelee
Copy link
Collaborator

While assessing the effectiveness of the RMSNorm operator, I observed that executing this operator on non-zero GPU resulted in a 'RuntimeError: CUDA error: an illegal memory access was encountered.'
Upon further investigation through debugging, I found that cause as the absence of device guards, most cuda kernels have the same issues .
I have addressed the issue by incorporating device guards for all kernels. Additionally, I have augmented the kernel tests by including device id,such as in the test_activationprovided

@jeejeelee
Copy link
Collaborator Author

jeejeelee commented Dec 7, 2023

I have completed all the kernel tests on the A800 GPU(x2), and all kernels can pass correctly.Can you review this PR? @WoosukKwon

@WoosukKwon WoosukKwon self-requested a review December 7, 2023 16:56
@WoosukKwon WoosukKwon self-assigned this Dec 7, 2023
@jeejeelee
Copy link
Collaborator Author

@WoosukKwon Are there any issues with this PR?

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jeejeelee thanks for submitting the PR. We've not noticed this bug since the device id is always 0 in vLLM. However, I agree that this change would make the kernel more portable.

Comment on lines 1 to 4
@@ -1,21 +1,20 @@
#include <torch/extension.h>
#include <ATen/cuda/CUDAContext.h>

#include <torch/extension.h>
#include <c10/cuda/CUDAGuard.h>
#include "dispatch_utils.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit:

Suggested change
#include <torch/extension.h>
#include <ATen/cuda/CUDAContext.h>
#include <torch/extension.h>
#include <c10/cuda/CUDAGuard.h>
#include "dispatch_utils.h"
#include <torch/extension.h>
#include <ATen/cuda/CUDAContext.h>
#include <c10/cuda/CUDAGuard.h>
#include "dispatch_utils.h"

@jeejeelee
Copy link
Collaborator Author

jeejeelee commented Dec 28, 2023

@WoosukKwon Thank you for your review. I have completed the following modifications:

  1. revert cpp code format
  2. test the kernels for device 0 and 1.
  3. add comma for python codes

Please review again

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeejeelee LGTM! Thanks for the PR and apologies for the delayed review. We got many PRs last month and didn't have enough bandwidth due to the holidays. 😅

@WoosukKwon WoosukKwon merged commit 77af974 into vllm-project:main Jan 3, 2024
jedibrillo pushed a commit to jedibrillo/vllm that referenced this pull request Jan 5, 2024
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
@jeejeelee jeejeelee deleted the fix-kernel-bug branch December 26, 2024 01:47
jinyouzhi pushed a commit to jinyouzhi/vllm that referenced this pull request Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants