Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Add mixed_precision for Qwen2 7B #12098

Merged

Conversation

Oscilloscope98
Copy link
Contributor

@Oscilloscope98 Oscilloscope98 commented Sep 20, 2024

Description

https://github.com/analytics-zoo/nano/issues/1633#issuecomment-2363009566

Support mixed_precision in from_pretrained function for NPU

  • If mixed_precision=True and load_in_low_bit='sym_int4', Qwen2 7B will use INT8 for lm_head
  • Model saved with mixed_precision=True/False will keep the same option when load_low_bit the saved model
  • Disable lm_head split when load_in_low_bit='sym_int8'
  • Update example accordingly

Copy link
Contributor

@jason-dai jason-dai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@rnwang04 rnwang04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others LGTM

@Oscilloscope98 Oscilloscope98 merged commit 828fa01 into intel-analytics:main Sep 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants