Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

[WiP] Fixing kv cache injection for LlaMa and Mistral #2244

Closed
wants to merge 4 commits into from

Conversation

dbogunowicz
Copy link
Contributor

No description provided.

@dbogunowicz
Copy link
Contributor Author

@abhinavnmagic can I have reviews and testing?

@abhinavnmagic
Copy link
Contributor

Does this PR fix ONNX export for quantized or just pruned or both? I will test accordingly.

@dbogunowicz
Copy link
Contributor Author

@abhinavnmagic for all the llama models, both quant and non-quant

@jeanniefinks
Copy link
Member

Per the main README announcement, SparseML is being deprecated by June 2, 2025. Closing the PR as work has been suspended; thank you for the inputs and support!

@jeanniefinks jeanniefinks deleted the feature/damian/fixing_injection branch May 29, 2025 23:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants