Skip to content

Latest commit

 

History

History
31 lines (26 loc) · 2.1 KB

context_params.md

File metadata and controls

31 lines (26 loc) · 2.1 KB

This class serves as a container for various configuration settings that influence the behavior of the Llama model. Here's a brief summary:

  1. Class Description: ContextParams is designed to manage configuration settings for the Llama model, including aspects like random number generation, batch sizes, threading, and algorithm-specific parameters.

  2. Properties:

    • seed: Seed for random number generation, with a default random integer or an option for a random seed.
    • context: Size of the text context, defaulting to 512.
    • batch: Maximum batch size for prompt processing.
    • threads: Number of threads for generation.
    • threadsBatch: Number of threads for batch processing.
    • ropeScalingType: Specifies the type of RoPE scaling.
    • ropeFreqBase: Base frequency for RoPE.
    • ropeFreqScale: Frequency scaling factor for RoPE.
    • yarnExtFactor: YaRN extrapolation mix factor.
    • yarnAttnFactor: YaRN attention magnitude scaling factor.
    • yarnBetaFast: YaRN low correction dimension.
    • yarnBetaSlow: YaRN high correction dimension.
    • yarnOrigCtx: Original context size for YaRN.
    • mulMatQ: Flag for using experimental 'mul_mat_q' kernels.
    • logitsAll: Flag to compute all logits in llama_eval().
    • embedding: Flag to operate in embedding mode only.
    • offloadKqv: Flag to offload KQV operations to GPU.
  3. Methods:

    • get(): Creates and returns a llama_context_params object with the current settings.
    • fromJson(Map<String, dynamic> json): Factory constructor to instantiate ContextParams from a JSON map.
    • toJson(): Converts the instance to a JSON map for serialization or debugging purposes.
  4. Usage: This class is utilized to fine-tune the Llama model's operation, catering to specific requirements like performance tuning, thread management, and enabling advanced features like YaRN and RoPE.

  5. Flexibility: The class offers flexibility in configuring the model's operation, making it suitable for diverse applications, from simple setups to complex, multi-threaded environments.