Accelerate Utilities #193

kylesayrs · 2024-10-21T18:46:39Z

Purpose

Implement offloading utility functions which greatly simplify/clarify offloading-related code in llm-compressor
Explicitly initialize quantization parameters as offloaded if the module is offloaded

Prerequisites

Changes

Changes not covered by prerequisites:

Implement getattr_chain utility function (also used by llm-compressor)
Implement depreciated utility decorator for future depreciations
Implement register_offload_parameter and delete_offload_parameter for easier initialization and removal of parameters related to quantization
Begin newly initialized quantization parameters on cpu if the module is offloaded offload
- Faster performance, removes dependency on get_execution_device

Depreciation Strategy

These functions should be depreciated, each for their own reason. These strategies will be implemented in follow-up PRs

Function	Depreciation Reason	Depreciation Strategy
is_module_offloaded	Use official has_offloaded_params	redirect to has_offloaded_params & depreciation warning
get_execution_device	Not useful as a general util	Remove uses from LC & depreciation warning
get_offloaded_device	Folded into update_offload_parameter	Replace uses in LC with update_offload_parameter & depreciation warning
update_prefix_dict	Folded into update_offload_parameter	Replace uses in LC with update_offload_parameter & depreciation warning
update_parameter_data	Use update_offload_data for better args ordering. Open to keeping this one around	Remove uses from LC and CT & depreciation warning

Upstream Strategy

Upstreaming functions to accelerate is a low priority, but comes with the benefit of more reviews and more official support

Function	Upstream Version
register_offload_parameter	N/A
update_offload_data	N/A
delete_offload_parameter	N/A
has_offloaded_params	1.1.0
align_module_device	1.1.0

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

src/compressed_tensors/quantization/lifecycle/initialize.py

src/compressed_tensors/utils/offload.py

src/compressed_tensors/utils/helpers.py

rahul-tuli

LGTM! with a few nits, good work on this!

src/compressed_tensors/utils/helpers.py

src/compressed_tensors/utils/offload.py

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs · 2024-12-06T06:41:23Z

Fixed a bug, added some tests

dsikka

why do we need the per token fix to land as a prereq for this PR?

dsikka

What would be the replacement for get_execution_device?

kylesayrs · 2024-12-09T16:58:40Z

@dsikka The function update_parameter_data takes new_param_data as an input and uses this to update parameter data. Previously, this function would simply overwrite the old data with new_param_data. Now, in order to reduce complexity and increase performance, update_parameter_data requires that parameter being updated and the new parameter data are the same shape.

This assumption causes an error in mock_per_token_calibration, tests/test_quantization/test_configs/test_strategies.py, which revealed to me that the shape used to initialize the per_token strategy and the shape computed by calculate_qparams are different shapes. I consider this ambiguity to be a bug which was causing the test to fail.

dsikka

This looks good overall.

Do you mind adding a simple lifecycle dosctring which shows the steps of offloaded modules/parameters to make it slightly easier to follow how the parameters are updated?

I also think we should kick-off W4A16/W8A8 oneshot workflows, similar to what we did here: https://app.asana.com/0/1207078450218847/1208568399648361/f to make sure it runs to completion. I think past issues we've seen have been with g_idx and activation quantization parameters.

src/compressed_tensors/utils/helpers.py

src/compressed_tensors/quantization/lifecycle/initialize.py

src/compressed_tensors/utils/offload.py

dsikka · 2024-12-09T17:03:10Z

What would be the replacement for get_execution_device?

I think I understand from your PR as to why this can be removed.

kylesayrs · 2024-12-09T17:11:03Z

@dsikka w.r.t. get_execution_device

The function isn't guaranteed to be performant for all device maps, for example half-offloaded models
The function has very few uses, so it may be worth removing

For these reasons it's a candidate (and we'll need it for the immediate future), but future work can determine whether we want to keep/ update it

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

dsikka · 2024-12-16T18:49:52Z

src/compressed_tensors/utils/offload.py

+):
+    """
+    Update the data of an existing parameter and its offload dict. Supports both
+    parameters of offloaded modules and non-offloaded modules


this supports non-offloaded modules? for what case?

Supporting non-offloaded modules allows this function to be used throughout the codebase without having to duplicate code

Ugleh...

if not has_offloaded_params(module): param = getattr(module, name) data = data.to(param.dtype) if param.device != "meta": param.data.copy_(data) else: update_offload_parameter(module, name, data)

Preetay!

update_offload_parameter(module, name, data)

dsikka · 2024-12-16T19:02:12Z

src/compressed_tensors/utils/offload.py

+    module: torch.nn.Module,
+    name: str,
+    parameter: torch.nn.Parameter,
+):


When registering the parameters during initialization, don't we know the device already, depending on if the module has been offloaded or not?

We can't pass that device to update_offload_parameter to be used when updating the weights_map?

I've just now rewritten these parts to a bit clearer.

don't we know the device already, depending on if the module has been offloaded or not?

During initialization, the _initialize_scale_zero_point function determines the initial onload device

# begin on the same device as other parameters or cpu if offloaded. # in the offloaded case, there's no point moving tensors to the execution device # if they're going to be immediately offloaded by `register_offload_parameter` params_device = next(module.parameters()).device device = "cpu" if has_offloaded_params(module) else params_device

It's the job of register_offload_parameter (and by extension update_offload_parameter, offload_to_weights_map) to determine the offload device.

if isinstance(weights_map, dict): if key in weights_map: offload_device = weights_map[key].device else: tens = next(iter(weights_map.values()), None) offload_device = tens.device if tens is not None else default_device

src/compressed_tensors/utils/offload.py

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

wip

bddc83c

kylesayrs mentioned this pull request Oct 22, 2024

additional fixes for HFQuantizer compatibility #136

Closed

kylesayrs added 10 commits October 23, 2024 05:05

add modify_offload_module

94d8c56

update docs

f939e98

WIP

167e741

cleanup functions, begin depreciation

cb6edb1

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

remove extra space

cb70047

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

revert get_offloaded_device

98a2889

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

update to align_module_device

8cd69ef

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

add requires skip for accelerate

0d23183

Merge remote-tracking branch 'origin' into kylesayrs/upstream-candidates

82235b3

fix per token initialization

0b0d8b6

kylesayrs mentioned this pull request Nov 19, 2024

[Bugfix] Update expected shape for per token strategy #210

Open

remove align_module_device

95e5907

kylesayrs marked this pull request as ready for review November 19, 2024 02:46

kylesayrs self-assigned this Nov 19, 2024

kylesayrs changed the title ~~[WIP] Accelerate Utilities~~ Accelerate Utilities Nov 19, 2024

horheynm reviewed Nov 20, 2024

View reviewed changes

src/compressed_tensors/quantization/lifecycle/initialize.py Show resolved Hide resolved

horheynm reviewed Nov 20, 2024

View reviewed changes

src/compressed_tensors/utils/offload.py Outdated Show resolved Hide resolved

horheynm reviewed Nov 20, 2024

View reviewed changes

src/compressed_tensors/utils/helpers.py Show resolved Hide resolved

kylesayrs requested a review from horheynm November 28, 2024 17:04

kylesayrs added 2 commits December 2, 2024 23:25

Merge remote-tracking branch 'origin' into kylesayrs/upstream-candidates

a6a3198

Merge remote-tracking branch 'origin' into kylesayrs/upstream-candidates

e3c3f95

kylesayrs mentioned this pull request Dec 5, 2024

VLM Support via GPTQ Hooks and Sequential Data Pipeline vllm-project/llm-compressor#914

Draft

rahul-tuli reviewed Dec 6, 2024

View reviewed changes

rahul-tuli previously approved these changes Dec 6, 2024

View reviewed changes

respond to nits

81a1eab

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs dismissed rahul-tuli’s stale review via 81a1eab December 6, 2024 03:59

kylesayrs mentioned this pull request Dec 6, 2024

Accelerate Utilities Follow-up #224

Merged

kylesayrs marked this pull request as draft December 6, 2024 05:52

Accelerate Utilities Follow-up (#224)

e7e1d81

kylesayrs marked this pull request as ready for review December 6, 2024 06:40

rename

9af736f

kylesayrs marked this pull request as draft December 6, 2024 07:18

kylesayrs added 2 commits December 6, 2024 07:40

implement recursive case

35fa1cd

remove print

38765bd

kylesayrs marked this pull request as ready for review December 6, 2024 08:15

dsikka reviewed Dec 6, 2024

View reviewed changes

support OffloadedWeightsLoader

64f4d98

dsikka reviewed Dec 9, 2024

View reviewed changes

src/compressed_tensors/utils/helpers.py Show resolved Hide resolved

src/compressed_tensors/quantization/lifecycle/initialize.py Show resolved Hide resolved

src/compressed_tensors/utils/offload.py Show resolved Hide resolved

kylesayrs added 2 commits December 10, 2024 18:59

add lifecycle docstring

b8ae387

implement offload_to_weights_map with recursive definition

870095e

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

dsikka reviewed Dec 16, 2024

View reviewed changes

kylesayrs added 6 commits December 16, 2024 14:12

add docstring

77411ca

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

fix type hint

a5b1792

add check_accelerate guard

ed9ee4e

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

make device used by clearer

1632cc3

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

update update_prefix_dict

1c55a10

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

reuse fixture

9177650

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate Utilities #193

Accelerate Utilities #193

kylesayrs commented Oct 21, 2024 •

edited

Loading

rahul-tuli left a comment

kylesayrs commented Dec 6, 2024

dsikka left a comment

dsikka left a comment

kylesayrs commented Dec 9, 2024

dsikka left a comment •

edited

Loading

dsikka commented Dec 9, 2024

kylesayrs commented Dec 9, 2024

dsikka Dec 16, 2024

kylesayrs Dec 16, 2024

dsikka Dec 16, 2024

kylesayrs Dec 16, 2024 •

edited

Loading

Accelerate Utilities #193

Are you sure you want to change the base?

Accelerate Utilities #193

Conversation

kylesayrs commented Oct 21, 2024 • edited Loading

Purpose

Prerequisites

Changes

Depreciation Strategy

Upstream Strategy

rahul-tuli left a comment

Choose a reason for hiding this comment

kylesayrs commented Dec 6, 2024

dsikka left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

kylesayrs commented Dec 9, 2024

dsikka left a comment • edited Loading

Choose a reason for hiding this comment

dsikka commented Dec 9, 2024

kylesayrs commented Dec 9, 2024

dsikka Dec 16, 2024

Choose a reason for hiding this comment

kylesayrs Dec 16, 2024

Choose a reason for hiding this comment

dsikka Dec 16, 2024

Choose a reason for hiding this comment

kylesayrs Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

kylesayrs commented Oct 21, 2024 •

edited

Loading

dsikka left a comment •

edited

Loading

kylesayrs Dec 16, 2024 •

edited

Loading