This class serves as a container for various configuration settings that influence the behavior of the Llama model. Here's a brief summary:
-
Class Description:
ContextParams
is designed to manage configuration settings for the Llama model, including aspects like random number generation, batch sizes, threading, and algorithm-specific parameters. -
Properties:
- seed: Seed for random number generation, with a default random integer or an option for a random seed.
- context: Size of the text context, defaulting to 512.
- batch: Maximum batch size for prompt processing.
- threads: Number of threads for generation.
- threadsBatch: Number of threads for batch processing.
- ropeScalingType: Specifies the type of RoPE scaling.
- ropeFreqBase: Base frequency for RoPE.
- ropeFreqScale: Frequency scaling factor for RoPE.
- yarnExtFactor: YaRN extrapolation mix factor.
- yarnAttnFactor: YaRN attention magnitude scaling factor.
- yarnBetaFast: YaRN low correction dimension.
- yarnBetaSlow: YaRN high correction dimension.
- yarnOrigCtx: Original context size for YaRN.
- mulMatQ: Flag for using experimental 'mul_mat_q' kernels.
- logitsAll: Flag to compute all logits in
llama_eval()
. - embedding: Flag to operate in embedding mode only.
- offloadKqv: Flag to offload KQV operations to GPU.
-
Methods:
- get(): Creates and returns a
llama_context_params
object with the current settings. - fromJson(Map<String, dynamic> json): Factory constructor to instantiate
ContextParams
from a JSON map. - toJson(): Converts the instance to a JSON map for serialization or debugging purposes.
- get(): Creates and returns a
-
Usage: This class is utilized to fine-tune the Llama model's operation, catering to specific requirements like performance tuning, thread management, and enabling advanced features like YaRN and RoPE.
-
Flexibility: The class offers flexibility in configuring the model's operation, making it suitable for diverse applications, from simple setups to complex, multi-threaded environments.