Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ecosystem] enable saving and loading FP8 model(#53) #1683

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

xin3he
Copy link
Contributor

@xin3he xin3he commented Jan 8, 2025

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@xin3he xin3he requested a review from regisss as a code owner January 8, 2025 02:01
@yafshar
Copy link
Contributor

yafshar commented Jan 8, 2025

@xin3he Could you please remove 'software ticket' and 'OHF' from the title? This PR is for OH

@yafshar
Copy link
Contributor

yafshar commented Jan 9, 2025

@xin3he can you please address the comments. everything else sounds good to me!

@xin3he xin3he changed the title [SW-211858] [Ecosystem] enable saving and loading FP8 model in OHF (#53) [Ecosystem] enable saving and loading FP8 model(#53) Jan 13, 2025
xin3he and others added 7 commits January 13, 2025 11:01
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>
@xin3he
Copy link
Contributor Author

xin3he commented Jan 13, 2025

Surely, thank you @yafshar, sorry for the delay response.

Signed-off-by: Xin He <xinhe3@habana.ai>
Copy link
Contributor

@yafshar yafshar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Hi @regisss, this PR is ready for your final review. Could you please take a look?

--bucket_internal \
--load_quantized_model_with_inc
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this section and just give examples of llama3

Copy link
Contributor Author

@xin3he xin3he Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @libinta I don't understand why we need to remove the loading section. Saving and loading is a pair that cannot work without each other.

Copy link
Contributor Author

@xin3he xin3he Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we misled you, what we save in huggingface format is different with existing huggingface models. The parameter name in neural magic models is different with INC quantized model. Also, INC only support one card save&load situation due to link. So we cannot reuse the load part for neural_magic models.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xin3he But the command is the same whether loading a model locally or from the hub right? We need to pass --load_quantized_model_with_inc and specify the model name (local path or hub name). So I agree with Libin that it would be better to have only one section.

@xin3he
Copy link
Contributor Author

xin3he commented Jan 17, 2025

A reminder of TODO:

  1. We need to add multi-cards saving and loading after this bug fix is merged into Habana software. Support pure meta model lm_head tp deepspeedai/DeepSpeed#6812.
  2. Will remove maxabs_quant_const_scales.json after PR is merged into Habana software. https://github.com/habana-internal/neural-compressor-fork/pull/6

May happen in v1.20.0.

parser.add_argument(
"--saved_model_path",
type=str,
default="saved_results",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's set the default to a better name? Like "inc_quantized_model" or so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

examples/text-generation/README.md Outdated Show resolved Hide resolved
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
After quantizing the model, we can save it to a local path.

> [!NOTE]
> Before executing the command below, please refer to the "Running FP8 Models on a Single Device" section to measure the model quantization statistics.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Before executing the command below, please refer to the "Running FP8 Models on a Single Device" section to measure the model quantization statistics.
> Before executing the command below, please refer to the ["Running FP8 Models on a Single Device" section](#running-fp8-models-on-a-single-device) to measure the model quantization statistics.

Let's add a link to it too, it will be easier for readers.

Can you also add the same here please? https://github.com/HabanaAI/optimum-habana-fork/blob/98539470b6c6dbf39145844317d878ddf5482167/examples/text-generation/README.md?plain=1#L600
I had missed that.

--bucket_internal \
--load_quantized_model_with_inc
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xin3he But the command is the same whether loading a model locally or from the hub right? We need to pass --load_quantized_model_with_inc and specify the model name (local path or hub name). So I agree with Libin that it would be better to have only one section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants