[Core]Add Ascend Quantize #7

Angazenn · 2025-02-05T13:19:45Z

This pr adds ascend quantization interface to vllm-ascend, including AscendQuantConfig class which inherits from vllm QuantizationConfig class, AscendLinearMethod class which inherits from vllm LinearMethodBase class, AscendQuantizer class that dispatches corresponding quanzation methods.

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan · 2025-02-07T08:18:07Z

vllm_ascend/ops/layernorm.py


 def forward_oot(
    self,
    x: torch.Tensor,
    residual: Optional[torch.Tensor] = None,
 ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
+    if hasattr(self, "module"):
+        return self.module.forward_anti_outlier(x, residual)


does self.module only used here? If yes, how about something like:

try: from mindie_turbo import RMSNormWithAntiOutlier except: RMSNormWithAntiOutlier = None def forward_oot(): if RMSNormWithAntiOutlier is not None: return RMSNormWithAntiOutlier(self.hidden_size).forward_anti_outlier(x, residual) ....

Not sure enable_rmsnorm_with_antioutlier is need, it seems only added a new self.module there.

Details of RMSNormWithAntiOutlier are moved out of vllm_ascend. There's no need to change the implemantation of rmsnorm in vllm_ascend now.

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan · 2025-02-07T13:07:29Z

vllm_ascend/quantization/quantizer.py

+
+            return MindIETurboQuantizer.get_quantizer(quant_config)
+        except Exception:
+            raise NotImplementedError("There is no available ascend quantizer.")


please use import_lib to check if mindie_turbo is available or not. The try cache here is too large.

Yes, this should be fixed.

wangxiyuan · 2025-02-07T13:08:46Z

tests/quantization/test_mindie_turbo.py

+]
+
+
+@pytest.mark.skipif(not is_mindie_turbo_supported(),


Please add a TODO here. Once more method is available in vllm-ascend. the skip can be removed

This test case is designed for quantization methods based on mindie-turbo. For other possible quantization methods in the future, we can add new test cases.

wangxiyuan · 2025-02-07T13:10:36Z

tests/quantization/test_mindie_turbo.py

+
+    import vllm_ascend  # noqa: F401
+    from vllm_ascend.quantization.quant_config import AscendLinearMethod
+


why import inner the test?

This is because mindie_turbo should be import before vllm in early versions of mindie_turbo. Perhaps this conflict has been resolved now and these packages imported can be moved outside.

ganyi1996ppo · 2025-02-07T15:15:45Z

vllm_ascend/quantization/quantizer.py

+
+            # When not using anti-outlier algorithms, "anti_method" refers to an empty string.
+            if len(quant_config["anti_method"]) > 0:
+                enable_rmsnorm_with_antioutlier()


In my perspective, this looks kind of strange, this interface seems very detail and specific, Is it possible for you to hide more detail under the hood? I believe this part can be wrote in more general way.

Details of RMSNormWithAntiOutlier are moved out of vllm_ascend. Related codes will be hidden into mindie_turbo.

…quantizer Signed-off-by: angazenn <zengyanjia@huawei.com>

Signed-off-by: angazenn <zengyanjia@huawei.com>

vllm_ascend/quantization/quantizer.py

Signed-off-by: angazenn <zengyanjia@huawei.com>

wuhuikx · 2025-02-08T14:35:37Z

tests/quantization/test_mindie_turbo.py

+from tests.quantization.utils import is_mindie_turbo_supported, example_quantization
+
+MODELS = [
+    "/home/zyj/data/Qwen2.5-0.5B-Instruct/",


what's this path?

this is an mistake. changed to Qwen/Qwen2.5-0.5B-Instruct now

wangxiyuan · 2025-02-10T02:42:28Z

vllm_ascend/platform.py

+        """
+        Do some pre-registeration or update action for ascend platform.
+        """
+        from vllm_ascend.quantization.quant_config import AscendQuantConfig  # noqa: F401


vllm-project/vllm#12432 is not merged. Maybe you can move this import to register function in __init__.py, but I'm not sure if it will lead the circle import error or not. You can have a try first.

it seems that there are circular import if moving this import. But the codes still work and whole inference process can generate correct texts, so I move this codes to register function

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn force-pushed the main branch from 6b98d38 to 7ea88a0 Compare February 5, 2025 13:24

Angazenn changed the base branch from main to develop February 7, 2025 01:55

Angazenn changed the title ~~Add Ascend Quantize~~ [Core]Add Ascend Quantize Feb 7, 2025

Angazenn force-pushed the main branch from c18a277 to 07c8c15 Compare February 7, 2025 02:05

angazenn added 7 commits February 7, 2025 10:08

add ascend quantize

bafc5f5

Signed-off-by: angazenn <zengyanjia@huawei.com>

add license

a4aaea4

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix quantization import bugs

43b9f70

Signed-off-by: angazenn <zengyanjia@huawei.com>

support skipping unquantized layers

46b7ca2

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix ci

5b1b34d

Signed-off-by: angazenn <zengyanjia@huawei.com>

remove unnecessary params

7a41f8f

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix import errors

d332351

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn force-pushed the main branch from 07c8c15 to d332351 Compare February 7, 2025 02:09

avoid import check

37c4543

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan reviewed Feb 7, 2025

View reviewed changes

angazenn added 5 commits February 7, 2025 17:18

add ascend quantization ut

7e230f0

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix quant description bugs

1bfb206

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix ut bugs

9bbc77c

Signed-off-by: angazenn <zengyanjia@huawei.com>

move quantizer initialization to linear method

b5d4bf6

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix bugs

d3dd745

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan reviewed Feb 7, 2025

View reviewed changes

ganyi1996ppo reviewed Feb 7, 2025

View reviewed changes

angazenn added 3 commits February 8, 2025 10:43

narrow down the codes that try-catch structure covers when importing …

b809659

…quantizer Signed-off-by: angazenn <zengyanjia@huawei.com>

move packages imported to the head

7f4b41c

Signed-off-by: angazenn <zengyanjia@huawei.com>

fix import bugs

a553906

Signed-off-by: angazenn <zengyanjia@huawei.com>

wangxiyuan reviewed Feb 8, 2025

View reviewed changes

vllm_ascend/quantization/quantizer.py Show resolved Hide resolved

angazenn added 4 commits February 8, 2025 11:41

fix import bugs

3b94a9f

Signed-off-by: angazenn <zengyanjia@huawei.com>

remove anti-outlier conditions from vllm_ascend

210e6dc

Signed-off-by: angazenn <zengyanjia@huawei.com>

remove unnecessary spaces

639b602

Signed-off-by: angazenn <zengyanjia@huawei.com>

add example quantization codes to ut

a92d9fe

Signed-off-by: angazenn <zengyanjia@huawei.com>

wuhuikx reviewed Feb 10, 2025

View reviewed changes

wangxiyuan reviewed Feb 10, 2025

View reviewed changes

angazenn added 2 commits February 10, 2025 13:26

fix bugs

44737f9

Signed-off-by: angazenn <zengyanjia@huawei.com>

move importation of AscendQuantConfig

f1e9556

Signed-off-by: angazenn <zengyanjia@huawei.com>

wuhuikx approved these changes Feb 10, 2025

View reviewed changes

wangxiyuan merged commit 7637759 into vllm-project:develop Feb 11, 2025
1 check passed

Yikun mentioned this pull request Feb 18, 2025

[Doc] Update doc to work with release #85

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core]Add Ascend Quantize #7

[Core]Add Ascend Quantize #7

Angazenn commented Feb 5, 2025

wangxiyuan Feb 7, 2025 •

edited

Loading

Angazenn Feb 8, 2025 •

edited

Loading

wangxiyuan Feb 7, 2025

Angazenn Feb 7, 2025

wangxiyuan Feb 7, 2025

Angazenn Feb 7, 2025

wangxiyuan Feb 7, 2025

Angazenn Feb 7, 2025

ganyi1996ppo Feb 7, 2025

Angazenn Feb 8, 2025

wuhuikx Feb 8, 2025

Angazenn Feb 10, 2025

wangxiyuan Feb 10, 2025

Angazenn Feb 10, 2025


		import vllm_ascend # noqa: F401
		from vllm_ascend.quantization.quant_config import AscendLinearMethod

		]


		@pytest.mark.skipif(not is_mindie_turbo_supported(),

[Core]Add Ascend Quantize #7

[Core]Add Ascend Quantize #7

Conversation

Angazenn commented Feb 5, 2025

wangxiyuan Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Angazenn Feb 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangxiyuan Feb 7, 2025 •

edited

Loading

Angazenn Feb 8, 2025 •

edited

Loading