[Ecosystem] enable saving and loading FP8 model(#53) #1683

xin3he · 2025-01-08T02:01:32Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

yafshar · 2025-01-08T23:00:53Z

@xin3he Could you please remove 'software ticket' and 'OHF' from the title? This PR is for OH

examples/text-generation/README.md

yafshar · 2025-01-09T19:10:51Z

@xin3he can you please address the comments. everything else sounds good to me!

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

xin3he · 2025-01-13T03:04:51Z

Surely, thank you @yafshar, sorry for the delay response.

Signed-off-by: Xin He <xinhe3@habana.ai>

yafshar

LGTM!

Hi @regisss, this PR is ready for your final review. Could you please take a look?

examples/text-generation/README.md

libinta · 2025-01-16T20:12:30Z

examples/text-generation/README.md

+--bucket_internal \
+--load_quantized_model_with_inc
+```
+


remove this section and just give examples of llama3

Hi, @libinta I don't understand why we need to remove the loading section. Saving and loading is a pair that cannot work without each other.

I think we misled you, what we save in huggingface format is different with existing huggingface models. The parameter name in neural magic models is different with INC quantized model. Also, INC only support one card save&load situation due to link. So we cannot reuse the load part for neural_magic models.

@xin3he But the command is the same whether loading a model locally or from the hub right? We need to pass --load_quantized_model_with_inc and specify the model name (local path or hub name). So I agree with Libin that it would be better to have only one section.

examples/text-generation/README.md

xin3he · 2025-01-17T02:06:04Z

A reminder of TODO:

We need to add multi-cards saving and loading after this bug fix is merged into Habana software. Support pure meta model lm_head tp deepspeedai/DeepSpeed#6812.
Will remove maxabs_quant_const_scales.json after PR is merged into Habana software. https://github.com/habana-internal/neural-compressor-fork/pull/6

May happen in v1.20.0.

regisss · 2025-01-21T15:59:20Z

examples/text-generation/run_generation.py

+    parser.add_argument(
+        "--saved_model_path",
+        type=str,
+        default="saved_results",


Maybe let's set the default to a better name? Like "inc_quantized_model" or so?

examples/text-generation/README.md

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

regisss · 2025-01-30T09:26:34Z

examples/text-generation/README.md

+After quantizing the model, we can save it to a local path.
+
+> [!NOTE]  
+> Before executing the command below, please refer to the "Running FP8 Models on a Single Device" section to measure the model quantization statistics.


Suggested change

> Before executing the command below, please refer to the "Running FP8 Models on a Single Device" section to measure the model quantization statistics.

> Before executing the command below, please refer to the ["Running FP8 Models on a Single Device" section](#running-fp8-models-on-a-single-device) to measure the model quantization statistics.

Let's add a link to it too, it will be easier for readers.

Can you also add the same here please? https://github.com/HabanaAI/optimum-habana-fork/blob/98539470b6c6dbf39145844317d878ddf5482167/examples/text-generation/README.md?plain=1#L600
I had missed that.

regisss · 2025-01-30T09:32:37Z

examples/text-generation/README.md

+--bucket_internal \
+--load_quantized_model_with_inc
+```
+


@xin3he But the command is the same whether loading a model locally or from the hub right? We need to pass --load_quantized_model_with_inc and specify the model name (local path or hub name). So I agree with Libin that it would be better to have only one section.

[SW-211858] [Ecosystem] enable saving and loading FP8 model in OHF (#53)

eae4688

xin3he requested a review from regisss as a code owner January 8, 2025 02:01

yafshar reviewed Jan 8, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

yafshar reviewed Jan 8, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

yafshar reviewed Jan 8, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

yafshar reviewed Jan 8, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

yafshar reviewed Jan 8, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

yafshar reviewed Jan 8, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

yafshar reviewed Jan 8, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

yafshar reviewed Jan 9, 2025

View reviewed changes

examples/text-generation/README.md Show resolved Hide resolved

xin3he changed the title ~~[SW-211858] [Ecosystem] enable saving and loading FP8 model in OHF (#53)~~ [Ecosystem] enable saving and loading FP8 model(#53) Jan 13, 2025

xin3he and others added 7 commits January 13, 2025 11:01

Update examples/text-generation/README.md

2348030

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

Update examples/text-generation/README.md

df4fc16

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

Update examples/text-generation/README.md

fbbabd7

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

Update examples/text-generation/README.md

35d0a86

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

Update examples/text-generation/README.md

db0832f

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

Update examples/text-generation/README.md

8a2fcee

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

Update examples/text-generation/README.md

f2cf26b

Co-authored-by: Yaser Afshar <yaser.afshar@intel.com>

workaround for 1.19.0 Synapse

512a225

Signed-off-by: Xin He <xinhe3@habana.ai>

yafshar approved these changes Jan 14, 2025

View reviewed changes

libinta reviewed Jan 16, 2025

View reviewed changes

examples/text-generation/README.md Show resolved Hide resolved

libinta reviewed Jan 16, 2025

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

Update README.md

f185190

regisss reviewed Jan 21, 2025

View reviewed changes

Update examples/text-generation/README.md

aef88f0

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

xin3he added 2 commits January 26, 2025 12:44

Update run_generation.py

132535f

Merge branch 'main' into auto-pr-5999a1a

9853947

regisss reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ecosystem] enable saving and loading FP8 model(#53) #1683

[Ecosystem] enable saving and loading FP8 model(#53) #1683

xin3he commented Jan 8, 2025

yafshar commented Jan 8, 2025

yafshar commented Jan 9, 2025

xin3he commented Jan 13, 2025

yafshar left a comment

libinta Jan 16, 2025

xin3he Jan 17, 2025 •

edited

Loading

xin3he Jan 17, 2025 •

edited

Loading

regisss Jan 30, 2025

xin3he commented Jan 17, 2025 •

edited

Loading

regisss Jan 21, 2025

xin3he Jan 26, 2025

regisss Jan 30, 2025

regisss Jan 30, 2025

	> Before executing the command below, please refer to the "Running FP8 Models on a Single Device" section to measure the model quantization statistics.
	> Before executing the command below, please refer to the ["Running FP8 Models on a Single Device" section](#running-fp8-models-on-a-single-device) to measure the model quantization statistics.

[Ecosystem] enable saving and loading FP8 model(#53) #1683

Are you sure you want to change the base?

[Ecosystem] enable saving and loading FP8 model(#53) #1683

Conversation

xin3he commented Jan 8, 2025

What does this PR do?

Before submitting

yafshar commented Jan 8, 2025

yafshar commented Jan 9, 2025

xin3he commented Jan 13, 2025

yafshar left a comment

Choose a reason for hiding this comment

libinta Jan 16, 2025

Choose a reason for hiding this comment

xin3he Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

xin3he Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

regisss Jan 30, 2025

Choose a reason for hiding this comment

xin3he commented Jan 17, 2025 • edited Loading

regisss Jan 21, 2025

Choose a reason for hiding this comment

xin3he Jan 26, 2025

Choose a reason for hiding this comment

regisss Jan 30, 2025

Choose a reason for hiding this comment

regisss Jan 30, 2025

Choose a reason for hiding this comment

xin3he Jan 17, 2025 •

edited

Loading

xin3he Jan 17, 2025 •

edited

Loading

xin3he commented Jan 17, 2025 •

edited

Loading