This class is designed for configuring how a Llama model is split and operated across multiple GPUs. Here's a brief overview:
-
Class Description:
ModelParams
configures the operational parameters for a Llama model, especially in multi-GPU environments. It includes settings for tensor splitting, metadata overrides, GPU layer storage, and memory management options. -
Properties:
- tensorSplit: List indicating how the model (layers or rows) is distributed across GPUs.
- metadataOverride: Map for overriding key-value pairs of the model metadata.
- gpuLayerLayer: The number of layers to store in VRAM, defaulting to 99.
- mainGpu: Determines which GPU is used based on the split mode.
- vocabOnly: Boolean flag to load only the vocabulary without weights.
- useMemorymap: Boolean flag to use memory mapping if possible.
- useMemoryLock: Boolean flag to keep the model in RAM.
-
Constructor: Initializes the
_tensorSplitPointer
and attaches it to the finalizer for memory management. -
Methods:
- dispose(): Releases allocated resources.
- get(): Constructs and returns a
llama_model_params
object based on current settings. - fromJson(Map<String, dynamic> json): Factory constructor to create an instance from a JSON map.
- toJson(): Converts the instance into a JSON map for serialization or debugging.
- toString(): Provides a string representation of the instance, useful for logging or debugging.
-
Memory Management: The class uses the Dart FFI (Foreign Function Interface) to allocate and manage native memory for the tensor split pointers and metadata overrides.