Skip to content

Commit

Permalink
Merge branch 'main' into modular_dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
tscholak authored Jan 9, 2025
2 parents e013ba2 + cf9d46e commit 6b45944
Show file tree
Hide file tree
Showing 5 changed files with 386 additions and 112 deletions.
264 changes: 263 additions & 1 deletion docs/developer_guide/conversion.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,268 @@
title: Model Conversion Reference
---

This reference guide describes all there is to know about Fast-LLM's checkpoint conversion system.
After reading this, you should be able to create your own `External` converter, in Hugging Face format or other.
And if you are familiar with the rest of Fast-LLM, you will also be able to create an entirely custom converter.

Fast-LLM provides a simple and [fully customizable interface](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/config.py) to save/load checkpoints and configurations.
This same interface is used both by Fast-LLM official checkpoint formats (`distributed` and `fast-llm`),
and by the checkpoint conversion interface.
It can also be used to define entirely new checkpoint formats, though this is generally not recommended.

In this guide we focus on the checkpoint conversion interface, in particular for Hugging Face formats,
since this is the most common use case.

## Checkpoint format metadata

When creating a new checkpoint format, the first step is to subclass `CheckpointFormat`.
This data structure holds important properties of the format, and makes them accessible at the configuration level.
Some important entries include:

* `name`: A name for the format, as will appear for example in configuration files
* `support_optimizer`: Whether the optimizer state can be included in a checkpoint.
* `support_saving`, `support_loading`: This can be used to create read-only or write-only formats.
* `get_handler_class()`: Return the actual checkpoint conversion class, as we'll soon describe.
The class should be imported lazily so the `CheckpointFormat` remains accessible by configurations.

Here is a simple example:

```python
class AwesomeCheckpointFormat(CheckpointFormat):
name = "awesome_checkpoint"
support_optimizer = False

@classmethod
def get_handler_class(cls):
from package.module import AwesomeCheckpointHandler

return AwesomeCheckpointHandler
```

Once the metadata class is created, we want to let the model know about it.
We do this by adding it to the `checkpoint_formats` property of the model configuration class. For example:

```python
@config_class()
class AwesomeModelConfig(FastLLMModelConfig):
checkpoint_formats = FastLLMModelConfig.checkpoint_formats + (AwesomeCheckpointFormat,)
# ...
```

You can see a more complete example in the [GPT model source code](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/models/gpt/config.py).

## External checkpoint handler

Now that we've defined a format, we're ready to tackle the actual implementation of an [external checkpoint handler](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/external.py).
External handlers define a list of converters that can be used to convert configurations and state tensors automatically.
They also require an implementation of checkpoint reading and writing,
although we already provide such [implementation](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/huggingface.py) for Hugging Face formats.

!!! note "Supported formats"
The external checkpoint conversion is principally designed for checkpoint formats that store state tensors in a variable list of `Savetensor` files.
It comes with default saving and loading that handles lazy loading, management of memory usage, safety checks.
It is possible to use a more generic format by overriding the `save` and (in some cases) `load` methods, but this requires significant effort.
Note that we may provide better generalization options at some point in the future.

Let's begin an example where we convert our `AwesomeModel` to its Hugging Face counterpart.
The first step is to define a handler class and let it know about our model class:

```python
class AwesomeHuggingfaceCheckpointHandler(HuggingfaceStateDictCheckpointHandler):
_model: AwesomeModel
_model_class= AwesomeModelConfig
```

### Configuration conversion

The configuration conversion utility interfaces between two configurations in the form of nested dictionaries:
a serialized Fast-LLM configuration and an external configuration.
The `_load_config` method is expected to read the configuration on disk, as expected by the checkpoint format,
and return the same configuration in the forma of a nested dictionary,
with `_save_config` handling the reverse operation.
See the [Hugging Face implementation](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/huggingface.py) for an example.

To perform the conversion, the checkpoint handler relies on a list of `ParamConverter` objects,
which describe how individual parameters (or in some case multiple ones) should be converted.
The `ParamConverter` base interface is a dataclass consisting of two variables and two methods:

* `fast_llm_names: tuple[tuple[str, ...], ...]`: An array of entry names on the Fast-LLM side, in tuple format.
For example, `((transformer, head_groups),)` refers to the single entry `config["transformer"]["head_groups"]`.
* `export_names: tuple[tuple[str, ...], ...]`: An array of entry names on the external side, in the same tuple format.
* `export_params(self, fast_llm_values: tuple[typing.Any, ...]) -> tuple[typing.Any, ...]`:
This method takes the configuration parameters corresponding to `fast_llm_names` (in the same order),
and returns converted parameters corresponding to `export_names`.
* `import_params(self, export_values: tuple[typing.Any, ...]) -> tuple[typing.Any, ...]`:
The converse of`export_params`, converting parameters corresponding to `export_names` into those corresponding to `fast_llm_names`.

While not strictly part of the interface, it may also be useful to define a dataclass `__post_init__`,
for example to restrict the number of parameters in `fast_llm_names` and `export_names`.

Fast-LLM offers several generic configuration converter classes, including:

* `RenameParamConverter`: A simple 1-1 mapping between parameters, with optional renaming but identical value.
Typically, most converters are of this type.
* `ConstantImportParamConverter`: A 1-0 mapping for Fast-LLM parameters that without an equivalent in the external format,
that must take a specific value `fast_llm_value` for conversion to make sense (i.e., they take a hard-coded value in the external format).
This type of converter is common for Hugging Face converters, as Hugging Face models support much fewer configuration parameters.
* `ConstantExportParamConverter`: A 0-1 mapping, the converse of `ConstantImportParamConverter`
* `MappedConfigParamConverter`: A 1-1 mapping similar to `RenameParamConverter`, but with a non-trivial relation between values.

In addition to those, you may need to implement your own custom converter.
Here is an example that associates several Fast-LLM variables with a tuple.

```python
@dataclasses.dataclass(kw_only=True)
class PackingParamConverter(ParamConverter):
def __post_init__(self):
# There may be any number of Fast-LLM variables, but only one external one
Assert.eq(len(self.export_names), 1)

def export_params(self, fast_llm_values):
# Pack the values into a single tuple.
return (fast_llm_values,)

def import_params(self, export_values):
# Unpack the values. We can safely assume `export_values` has length one because of the assertion in `__post_init__`
return export_values[0]
```

Now that we've seen how parameter converters work, we're ready to add them to our handler class.
We do so by creating a list of converters in the `_create_config_converters` class method.
Continuing our `AwesomeModel` handler example, we define:

```python
@classmethod
def _create_config_converters(cls) -> list[ParamConverter]:
# For Hugging Face handlers, we need to call the superclass method.
return super()._create_config_converters() + [
# A trivial example where both the name and value are the same on both sides.
RenameParamConverter(
fast_llm_names=(("vocab_size",),),
export_names=(("vocab_size",),),
),
# A non-trivial example of `RenameParamConverter` with renaming and handling of nested dictionaries.
RenameParamConverter(
fast_llm_names=(("transformer", "rotary", "theta"),), export_names=(("rope_theta",),)
),
# A constant import example indicating that the external format does not support absolute positional embeddings.
ConstantImportParamConverter(fast_llm_names=(("use_position_embeddings",),), fast_llm_value=False),
# The `architectures` parameter is a common use case for `ConstantExportParamConverter` in Hugging Face models.
ConstantExportParamConverter(export_names=(("architectures",),), export_value=["AwesomeModelForCausalLM"]),
# A value mapping example, where we match Fast-LLM activation types with their Hugging Face equivalents.
MappedConfigParamConverter(
fast_llm_names=(("transformer", "activation_type"),),
export_names=(("hidden_act",),),
fast_llm_value=ActivationType.from_hf_name,
export_value=lambda activation_type: activation_type.hf_name,
),
# A more hypothetical example using `PackingParamConverter` to pack two parameters `epsilon_1`, `epsilon_2` into a tuple `eps`.
PackingParamConverter(
fast_llm_names=(("epsilon_1",),("epsilon_2",)),
export_names=(("eps",),),
),
]
```

!!! note "How conversion works"
The once the converters are defined, the conversion utility takes it from there.
Exporting works as follows (importing work similarly):
*The handler creates an empty export config dict, then loops over its list of converters. For each converter, it:
* Reads the value of each parameter defined in `fast_llm_names`, and gathers them in a tuple.
*Calls `converter.export_params`, providing the set of read values as argument.
* Ensure that the returned value has the correct length (that of `export_names`)
* Set the respective values in the export config dict.

!!! note "About `MISSING` and `DEFAULT`"
If a value is not found during import, it will be replaced by the `MISSING` tag.
The converter's `import_params` has the opportunity to handle this missing value,
and if a `MISSING`, the handler will throw an error because it does not know what value to set on the Fast-LLM side.

The `MISSING` tag is also supported during export,
but has a different meaning as the value is always expected to be found in the Fast-LLM configuration.
Instead, `export_params` may return a `MISSING` tag indicating that no value should not be added to the Fast-LLM config.
It may also return `DEFAULT`, which will be replaced by the default value for the configuration parameter.

Note that the handling of `MISSING` and `DEFAULT` is experimental and may be improved in the future.

### State conversion

State conversion follows the same principle as configuration conversion, but acts on flat dictionaries of state tensors.
Converters are defined by subclassing `WeightConverter`, with the interface:

* `fast_llm_name: str | tuple[str, ...]`: An entry name or array of entry names on the Fast-LLM side.
For example, `((transformer, head_groups),)` refers to the single entry `config["transformer"]["head_groups"]`.
* `export_name: str | tuple[str, ...]`: An entry name or array of entry names on the external side.
* `export_weight(self, weight: tuple[torch.Tensor | SafeTensorSlice, ...]) -> tuple[torch.Tensor | SafeTensorSlice, ...]`:
This method takes the state dict entries corresponding to `fast_llm_name` (in the same order),
and returns converted entries corresponding to `export_name`.
* `import_weight(self, weight: tuple[torch.Tensor | SafeTensorSlice, ...]) -> tuple[torch.Tensor | SafeTensorSlice, ...]`:
The converse of`export_weight`, converting state dict entries corresponding to `export_name` into those corresponding to `fast_llm_name`.

Fast-LLM offers several generic state dict converter classes, including:

* `WeightConverter`: The base class allows for a simple 1-1 mapping between parameters with optional renaming, similar to `RenameParamConverter`.
* `SplitWeightConverter`: A 1-N mapping, where a Fast-LLM parameter corresponds to multiple equally-sized chunks in the external side.
This happens for example in the MLP, where Hugging Fast keeps the `gate` and `up` parts separate,
while Fast-LLM combines those in a single tensor to improve performance (and similarly for the multiple experts in the case of MoE).

Since different libraries tend to hold weights in different formats, it is often necessary to define custom converters.
Here is an example where a weight needs to be transposed during conversion:

```python
class TransposeWeightConverter(WeightConverter):
def export_weight(
self, weight: tuple[torch.Tensor | SafeTensorSlice, ...]
) -> tuple[torch.Tensor | SafeTensorSlice, ...]:
Assert.eq(len(weight), 1)
return (weight[0][:].transpose().contiguous(),)

def import_weight(
self, weight: tuple[torch.Tensor | SafeTensorSlice, ...]
) -> tuple[torch.Tensor | SafeTensorSlice, ...]:
Assert.eq(len(weight), 1)
return (weight[0][:].transpose().contiguous(),)
```

We define the list of weight converters in the `_create_weight_converters` method.
Continuing our `AwesomeModel` handler example, we define:

```python
def _create_weight_converters(self) -> list[WeightConverter]:
converters = []
# The set of converters may depend on the base model configuration, which is accessible through `self._model.base_model_config`.
num_layers = self._model.base_model_config.transformer.num_layers

# A simple renaming example, for the word embeddings.
converters.append(WeightConverter("layers.0.word_embeddings_weight", "model.embed_tokens.weight"))

# We usually want to loop dynamically over layers
for i in range(num_layers):
# A `SplitWeightConverter` example, splitting a weight in two.
converters.append(SplitWeightConverter(
f"layers.{i+1}.weight",
(f"model.layers.{i}.weight_1", f"model.layers.{i}.weight_2"),
))
return converters
```

And that's it! We're ready to use the new checkpoint format in Fast-LLM.
For example, we may set the pretrained and export format in a configuration using

```yaml
training:
export:
format: awesome_checkpoint
pretrained:
format: awesome_checkpoint
```
### External converters beyond Hugging Face
!!! warning
Coming soon. Stay tuned for new updates!
## Creating a custom checkpoint format
Looking to implement your own converter? This reference is on the way. Stay tuned!
!!! warning
Coming soon. Stay tuned for new updates!
Loading

0 comments on commit 6b45944

Please sign in to comment.