Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] Script to export 🤗 models #4723

Closed
wants to merge 1 commit into from
Closed

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Aug 15, 2024

[Done] Require PR Make StaticCache configurable at model construct time in order to export, lower and run the 🤗 model OOTB.
[Done] Require huggingface/transformers#33303 or huggingface/transformers#33287 to be merged to 🤗 transformers to resolve the export issue introduced by huggingface/transformers#32543


Now we can take the integration point from 🤗 transformers to lower compatible models to ExecuTorch OOTB.

  • This PR creates a simple script with recipe of XNNPACK.
  • This PR also created a secret EXECUTORCH_HT_TOKEN to allow download checkpoints in the CI
  • This PR connects the 🤗 "Export to ExecuTorch" e2e workflow to ExecuTorch CI

Instructions to run the demo:

  1. Run the export_hf_model.py to lower gemma-2b to ExecuTorch:
python -m extension.export_util.export_hf_model -hfm "google/gemma-2b" # The model is exported statical dims with static KV cache
  1. Run the tokenizer.py to generate the binary format for ExecuTorch runtime:
python -m extension.llm.tokenizer.tokenizer -t <path_to_downloaded_gemma_checkpoint_dir>/tokenizer.model -o tokenizer.bin
  1. Build llm runner by following this guide step 4

  2. Run the lowered model

cmake-out/examples/models/llama2/llama_main --model_path=gemma.pte --tokenizer_path=tokenizer.bin --prompt="My name is"

OOTB output and perf

I 00:00:00.003110 executorch:cpuinfo_utils.cpp:62] Reading file /sys/devices/soc0/image_version
I 00:00:00.003360 executorch:cpuinfo_utils.cpp:78] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.003380 executorch:cpuinfo_utils.cpp:158] Number of efficient cores 4
I 00:00:00.003384 executorch:main.cpp:65] Resetting threadpool with num threads = 6
I 00:00:00.014716 executorch:runner.cpp:51] Creating LLaMa runner: model_path=gemma.pte, tokenizer_path=tokenizer_gemma.bin
I 00:00:03.065359 executorch:runner.cpp:66] Reading metadata from model
I 00:00:03.065391 executorch:metadata_util.h:43] get_n_bos: 1
I 00:00:03.065396 executorch:metadata_util.h:43] get_n_eos: 1
I 00:00:03.065399 executorch:metadata_util.h:43] get_max_seq_len: 123
I 00:00:03.065402 executorch:metadata_util.h:43] use_kv_cache: 1
I 00:00:03.065404 executorch:metadata_util.h:41] The model does not contain use_sdpa_with_kv_cache method, using default value 0
I 00:00:03.065405 executorch:metadata_util.h:43] use_sdpa_with_kv_cache: 0
I 00:00:03.065407 executorch:metadata_util.h:41] The model does not contain append_eos_to_prompt method, using default value 0
I 00:00:03.065409 executorch:metadata_util.h:43] append_eos_to_prompt: 0
I 00:00:03.065411 executorch:metadata_util.h:41] The model does not contain enable_dynamic_shape method, using default value 0
I 00:00:03.065412 executorch:metadata_util.h:43] enable_dynamic_shape: 0
I 00:00:03.130388 executorch:metadata_util.h:43] get_vocab_size: 256000
I 00:00:03.130405 executorch:metadata_util.h:43] get_bos_id: 2
I 00:00:03.130408 executorch:metadata_util.h:43] get_eos_id: 1
My name is Melle. I am a 20 year old girl from Belgium. I am living in the southern part of Belgium. I am 165 cm tall and I weigh 45kg. I like to play sports like swimming, running and playing tennis. I am very interested in music and I like to listen to classical music. I like to sing and I can play the piano. I would like to go to the USA because I like to travel a lot. I am looking for a boy from the USA who is between 18 and 25 years old. I
PyTorchObserver {"prompt_tokens":4,"generated_tokens":118,"model_load_start_ms":1723685715497,"model_load_end_ms":1723685718612,"inference_start_ms":1723685718612,"inference_end_ms":1723685732965,"prompt_eval_end_ms":1723685719087,"first_token_ms":1723685719087,"aggregate_sampling_time_ms":182,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:17.482472 executorch:stats.h:70] 	Prompt Tokens: 4    Generated Tokens: 118
I 00:00:17.482475 executorch:stats.h:76] 	Model Load Time:		3.115000 (seconds)
I 00:00:17.482481 executorch:stats.h:86] 	Total inference time:		14.353000 (seconds)		 Rate: 	8.221278 (tokens/second)
I 00:00:17.482483 executorch:stats.h:94] 		Prompt evaluation:	0.475000 (seconds)		 Rate: 	8.421053 (tokens/second)
I 00:00:17.482485 executorch:stats.h:105] 		Generated 118 tokens:	13.878000 (seconds)		 Rate: 	8.502666 (tokens/second)
I 00:00:17.482486 executorch:stats.h:113] 	Time to first generated token:	0.475000 (seconds)
I 00:00:17.482488 executorch:stats.h:120] 	Sampling time over 122 tokens:	0.182000 (seconds)

Copy link

pytorch-bot bot commented Aug 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4723

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c707e4c with merge base bfce743 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 15, 2024
@guangy10
Copy link
Contributor Author

Not to merge until the dependency PRs are merged in 🤗 and included in the release, then we can bump the transformers version, merge this PR with CI to run it

@guangy10 guangy10 force-pushed the gemma_executorch branch 2 times, most recently from 232fed9 to cba4ffa Compare September 10, 2024 22:47
@guangy10 guangy10 changed the title [Not To Merge][Experimental] Script to export 🤗 models [Experimental] Script to export 🤗 models Sep 10, 2024
@guangy10 guangy10 marked this pull request as draft September 10, 2024 22:52
@guangy10 guangy10 force-pushed the gemma_executorch branch 3 times, most recently from de3430d to fb5672c Compare September 10, 2024 23:52
.github/workflows/trunk.yml Outdated Show resolved Hide resolved
@guangy10 guangy10 requested a review from huydhn September 11, 2024 00:22
@guangy10 guangy10 force-pushed the gemma_executorch branch 10 times, most recently from 3e9acfe to 106883e Compare September 11, 2024 20:36
@guangy10 guangy10 force-pushed the gemma_executorch branch 2 times, most recently from 6333278 to 422102f Compare September 11, 2024 22:36
@guangy10
Copy link
Contributor Author

The failure is expected because the required patch (huggingface/transformers#33303 or huggingface/transformers#33287) has not been merged to transformers yet.

@guangy10 guangy10 force-pushed the gemma_executorch branch 5 times, most recently from d525d58 to 04b5ed2 Compare September 12, 2024 00:17
@guangy10
Copy link
Contributor Author

Once this PR is unblocked and merged, we will connect the same workflow to the benchmarking infra.

@guangy10 guangy10 marked this pull request as ready for review September 12, 2024 00:45
@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

.github/workflows/trunk.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow and the script overall LGTM!

@guangy10 guangy10 force-pushed the gemma_executorch branch 3 times, most recently from aefff2e to b3eefd7 Compare September 13, 2024 19:02
@guangy10
Copy link
Contributor Author

test-huggingface-transformers (google/gemma-2b) is working e2e. Can start merging this PR now.

@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@guangy10 merged this pull request in 67be84b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants