Skip to content

Conversation

@kimishpatel
Copy link
Collaborator

Summary:
This likely might be a short lived optimization where in future we can replace update_cache op with index_put_ op.
This is what original StaticCache does, however this requires cache transpose for custom_sdpa (which can also be fixed).

We will leverage custom cache for now, however in near future this should not be needed. This option however will allow us to bypass any transposes if the need continues

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kimishpatel kimishpatel force-pushed the custom_kv_cache branch 2 times, most recently from 378d613 to 05c39f1 Compare May 23, 2025 16:24
@guangy10 guangy10 self-requested a review May 28, 2025 20:09
Copy link
Collaborator

@guangy10 guangy10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perf boost seems significant! Thanks for adding the custom KV cache to HF models

Comment on lines -31 to +35
"-m", "--model", type=str, required=True, help="Model ID on huggingface.co or path on disk to load model from."
"-m",
"--model",
type=str,
required=True,
help="Model ID on huggingface.co or path on disk to load model from.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pip install '.[quality]', then run make style? It will run linter and will fix the Code Quality failures in the CI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the style requirement here abiding to hf/optimum guidelines? or et ones? I dont know if we are running different linters between et and here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@guangy10 guangy10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this!

@guangy10 guangy10 requested a review from echarlaix June 2, 2025 18:51
@kimishpatel
Copy link
Collaborator Author

can you rerun the tests? some suggest no space left on device? I will fix the code quality one.

@kimishpatel kimishpatel force-pushed the custom_kv_cache branch 2 times, most recently from 3d0deb8 to 0f02dc9 Compare June 3, 2025 22:43
Comment on lines 181 to 195
@pytest.mark.skipif(
parse(transformers.__version__) < parse("4.52.0"),
reason="Only available on transformers >= 4.52.0",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I noticed this test is skipped in the CI https://github.com/huggingface/optimum-executorch/actions/runs/15429343459/job/43424006619?pr=69#step:5:876.

It's skipped because the installed version 4.52.0.dev0 (because it's installed from source in the CI as when I enabled gemma3, 4.52 is not released yet). To fix this you will need just compare the base version or compare version with dev0, example here

@pytest.mark.skipif(
is_transformers_version("<", "4.52.0.dev0"),
reason="Only available on transformers >= 4.52.0.dev0",
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let me fix that. BTW there are other test failures that seem unrelated

Summary:
This likely might be a short lived optimization where in future we can
replace update_cache op with index_put_ op.
This is what original StaticCache does, however this requires cache
transpose for custom_sdpa (which can also be fixed).

We will leverage custom cache for now, however in near future this
should not be needed. This option however will allow us to bypass any
transposes if the need continues

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
@guangy10 guangy10 merged commit de493ba into huggingface:main Jun 4, 2025
208 of 212 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants