Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup ModelCompressor, fix reload bugs #172

Closed
wants to merge 14 commits into from

Conversation

kylesayrs
Copy link
Contributor

@kylesayrs kylesayrs commented Sep 28, 2024

@kylesayrs kylesayrs marked this pull request as draft September 28, 2024 22:26
@kylesayrs kylesayrs changed the title WIP Cleanup ModelCompressor, fix reload bug Sep 29, 2024
@kylesayrs kylesayrs self-assigned this Sep 29, 2024
@kylesayrs kylesayrs changed the title Cleanup ModelCompressor, fix reload bug Cleanup ModelCompressor, fix reload bugs Sep 29, 2024
@dsikka
Copy link
Contributor

dsikka commented Sep 29, 2024

@kylesayrs The first was set-up by design. This gives us two separate pathways, which we would like to keep for the time being

Copy link
Contributor

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest opening separate PRs for each bug fix that you'd like to have changed. Otherwise, the PR can't be accepted in its current state.

@kylesayrs
Copy link
Contributor Author

kylesayrs commented Sep 30, 2024

@dsikka w.r.t. your first comment, doesn't this mean that users can no longer load back models using SparseAutoModelForCausalLM? The reload tests I added seem like reasonable use cases, but they fail on main without these changes.

@dsikka
Copy link
Contributor

dsikka commented Sep 30, 2024

@dsikka w.r.t. your first comment, doesn't this mean that users can no longer load back models using SparseAutoModelForCausalLM? The reload tests I added seem like reasonable use cases, but they fail on main without these changes.

Please read through the description of the PR you referenced: #164
Legacy models will still be able to load through the the SparseAutoModelForCausalLM pathway when running inference. The scope of what we're changing right now is focused on inference.

@kylesayrs
Copy link
Contributor Author

kylesayrs commented Sep 30, 2024

@dsikka When I was referring to loading back models, I wasn't referring to legacy models, I was referring to models compressed today (with quantization_config).

There are a couple reasons which motivate why ModelCompressor should support loading from either a compression_config (legacy) or a quantization_config, as implemented here.

  1. In the reload case where HF quantizer is not available (ie the transformer version is not up-to-date), SparseAutoModelForCausalLM will need to use ModelCompressor to parse a config with the quantization_config key. I would point out the this failure in the LC tests, but LC is failing for other reasons right now, and I believe another bug would be masking this effect anyways (namely that in wrap_hf_model_class, it's assumed that if a quantization_config is present, then HF quantizer is active, which is not necessarily the case)

  2. It's nice to maintain the property that if ModelCompressor saves quantization_configs, then ModelCompressor can parse/ be loaded from quantization_configs

@kylesayrs
Copy link
Contributor Author

Relevant: #164 (review)

Do we need to set the transformers minimum version to 4.45 as well to load the models back in?

@kylesayrs
Copy link
Contributor Author

Split up into 3 PRs, some of which are pending greater changes later

@kylesayrs kylesayrs closed this Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants