Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 1.65 KB

model_params.md

File metadata and controls

23 lines (18 loc) · 1.65 KB

This class is designed for configuring how a Llama model is split and operated across multiple GPUs. Here's a brief overview:

  1. Class Description: ModelParams configures the operational parameters for a Llama model, especially in multi-GPU environments. It includes settings for tensor splitting, metadata overrides, GPU layer storage, and memory management options.

  2. Properties:

    • tensorSplit: List indicating how the model (layers or rows) is distributed across GPUs.
    • metadataOverride: Map for overriding key-value pairs of the model metadata.
    • gpuLayerLayer: The number of layers to store in VRAM, defaulting to 99.
    • mainGpu: Determines which GPU is used based on the split mode.
    • vocabOnly: Boolean flag to load only the vocabulary without weights.
    • useMemorymap: Boolean flag to use memory mapping if possible.
    • useMemoryLock: Boolean flag to keep the model in RAM.
  3. Constructor: Initializes the _tensorSplitPointer and attaches it to the finalizer for memory management.

  4. Methods:

    • dispose(): Releases allocated resources.
    • get(): Constructs and returns a llama_model_params object based on current settings.
    • fromJson(Map<String, dynamic> json): Factory constructor to create an instance from a JSON map.
    • toJson(): Converts the instance into a JSON map for serialization or debugging.
    • toString(): Provides a string representation of the instance, useful for logging or debugging.
  5. Memory Management: The class uses the Dart FFI (Foreign Function Interface) to allocate and manage native memory for the tensor split pointers and metadata overrides.