Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetune LORA #2632

Merged
merged 247 commits into from
Sep 28, 2023
Merged

Finetune LORA #2632

merged 247 commits into from
Sep 28, 2023

Commits on Jul 28, 2023

  1. Configuration menu
    Copy the full SHA
    5d124d0 View commit details
    Browse the repository at this point in the history
  2. remove unnecessary Adam(W) optimizer tensors.

    reduces optimizer memory overhead from 7*modelsize to 2*modelsize.
    
    additionally allows to optimize models with more than 2^31 parameters by replacing int with int64_t.
    
    bumps training checkpoint file version, but old checkpoints can still be read.
    new version with less tensors is saved.
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    d39c8e6 View commit details
    Browse the repository at this point in the history
  3. add gradient clipping to AdamW

    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    d395b19 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d7003a9 View commit details
    Browse the repository at this point in the history
  5. implement gradient checkpointing for training

    reduces memory overhead from O(n_layer) to O(sqrt(n_layer))
    
    as explained in readme of https://github.com/cybertronai/gradient-checkpointing
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    6e3f95b View commit details
    Browse the repository at this point in the history
  6. remove unused compute buffer 3

    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    e05e441 View commit details
    Browse the repository at this point in the history
  7. add and use function ggml_build_backward_expand to avoid stack overfl…

    …ows with large maximum number of nodes
    
    GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    ed4319e View commit details
    Browse the repository at this point in the history
  8. change AdamW decay parameter to work like the torch AdamW decay param…

    …eter
    
    It is now relative to Adam learning rate `alpha*sched`.
    Before that it was relative to `sched` only.
    
    `alpha` being the maximum learning rate and `sched` being a scaling parameter in [0..1]
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    a80f184 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f175ead View commit details
    Browse the repository at this point in the history
  10. change default AdamW weight decay parameter defined in ggml to 0.0, m…

    …aking Adam default instead of AdamW
    
    btw: the default weight decay parameter for torch.optim.AdamW is 0.01
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    97964a4 View commit details
    Browse the repository at this point in the history
  11. bug fixes for cross entropy loss

    ggml_cross_entropy_loss: sums where not correctly added in workload of each thread
    ggml_cross_entropy_loss_back: simplify backward process, reducing numerical issues
    
    guard usage of exp f16 lookup in cross entropy by #define GGML_CROSS_ENTROPY_EXP_FP16
    
    cross entropy loss is only used once during training, but it is quite sensitive to numerical errors introduced by exp-f16-lookup.
    so exp-f16-lookup for cross entropy loss is disabled by default, trading better gradients for very slightly worse runtime performance.
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    2c6985f View commit details
    Browse the repository at this point in the history
  12. fix test-grad0 for cross_entropy_loss

    the second argument to cross_entropy_loss must sum up to 1 for each row
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    2d1e6e0 View commit details
    Browse the repository at this point in the history
  13. fix test-grad0 for soft_max

    dont use only sum as aggregation, because sum of softmax is always 1 -> finite differences should not work
    instead use sum(log(soft_max()*(1-eps)+eps)); use eps to avoid log(0)
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    864e7e3 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    87febee View commit details
    Browse the repository at this point in the history
  15. change cross_entropy_loss to output average over all rows

    this helps keeping the loss and gradients in a sane range
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    51dc770 View commit details
    Browse the repository at this point in the history
  16. improve gradient checkpointing

    sqrt(n_layers) is only the best checkpoint step when mem size of checkpoints and mem size of layers are equal.
    since layers require more memory than the single-tensor-checkpoint we use, the optimal values are compute different:
    
    ```
      given: n, u, v
      objective: minimize(a*u+b*v) where a*b=n, a>0, b>0
      b=n/a
      minimize(a*u+v*n/a)
      diff(a*u+v*n/a, a) = u - (v*n/a)/a
      diff(a*u+v*n/a, a) == 0
      u - (v*n/a)/a == 0
      u == v*n/(a*a)
      u*a*a = v*n
      a*a = v*n/u
      a = sqrt(n*v/u)
    ```
    
    this change results in more checkpoints, requiring less layers to store between checkpoints, overall improving memory usage.
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    3744a9b View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    fc379a2 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    d0fbb7d View commit details
    Browse the repository at this point in the history
  19. add more training parameters:

    --enable-restart N         Only for Adam optimizer. Enable restarts of cos-decay
    --disable-restart N        Only for Adam optimizer. Disable restarts of cos-decay
    --opt-past N               Number of optimization iterations to track for delta convergence test. Disabled when zero.
    --opt-delta N              Maximum delta for delta convergence test. Disabled when <= zero.
    --opt-max-no-improvement N Maximum number of optimization iterations with no improvement. Disabled when <= zero.
    --adam-epsf N              AdamW epsilon for convergence test. Disabled when <= zero.
    --adam-min-alpha N         Adam minimum learning rate alpha, usually 0.1 * alpha
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    c6a18e1 View commit details
    Browse the repository at this point in the history
  20. replace memcpy with reshape operation so that the graph is not cut at…

    … the input
    
    this makes it possible to store other values into the input tensor and then simply recompute the graph without rebuilding it
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    ce937bc View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    ff759d9 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    e843d6e View commit details
    Browse the repository at this point in the history
  23. add optimization callback to ggml_opt_resume_g

    this callback is called before each iteration with custom data and pointer to learning schedule parameter (only used in Adam(W)).
    
    can be used for dynamic learning schedule and setting input data for batches before each iteration
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    bfc3119 View commit details
    Browse the repository at this point in the history
  24. use optimization callback in training

    allows dynamic learning schedule and different batch data for each iteration without relying on low n_iter and high n_examples parameters
    
    reduces runtime by avoiding restart of optimization function and improves training convergence by providing a different batch for each iteration
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    d7aa4d9 View commit details
    Browse the repository at this point in the history
  25. add minimum number of tensor dimensions to apply weight decay (defaul…

    …t 2)
    
    this allows to not apply weight decay to bias parameters
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    e6ff072 View commit details
    Browse the repository at this point in the history
  26. rename training parameter cos-decay-alpha to cos-decay-min and clarif…

    …y that adam-min-alpha also applies to warmup
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    58024d3 View commit details
    Browse the repository at this point in the history
  27. fix increase of model.train_samples and model.train_tokens

    now that each optimizer iteration gets its own batch we need to multiply by number of opt iterations
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    17a0898 View commit details
    Browse the repository at this point in the history
  28. change sampling parameters for prediction after training to defaults …

    …of common.h
    
    and clarify what is context for prediction and what are generated tokens
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    24a4b09 View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    1065c3b View commit details
    Browse the repository at this point in the history
  30. add conditional compilation of using F16 exp in flash attention

    uncomment `// #define GGML_FLASH_ATTN_EXP_FP16` to enable usage of f16 exp in flash attention
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    dbbc263 View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    47055c9 View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    0f6a8ab View commit details
    Browse the repository at this point in the history
  33. remove out-commented vectorized code of opt_adam

    the vectorized code might be bit faster for low number of parameters, but it had a big memory usage overhead
    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    87035b9 View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    ecdc161 View commit details
    Browse the repository at this point in the history
  35. Configuration menu
    Copy the full SHA
    c1a5e11 View commit details
    Browse the repository at this point in the history
  36. remove trailing whitespace

    xaedes committed Jul 28, 2023
    Configuration menu
    Copy the full SHA
    22cb368 View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2023

  1. Configuration menu
    Copy the full SHA
    d43af4b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2bf422e View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2023

  1. in train function replace add_inplace by regular add

    because using add_inplace seems to result in different gradients
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    fc826c8 View commit details
    Browse the repository at this point in the history
  2. don't use allocate hash_map on context

    because the context has no_alloc=True when using memory allocator resulting in NULL data pointers
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    d437415 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    cfddc36 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0dd496c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    52c92c0 View commit details
    Browse the repository at this point in the history
  6. correctly clone view tensors by setting data pointers

    without this the checkpointing would only work when being used together with memory allocator
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    345f516 View commit details
    Browse the repository at this point in the history
  7. fix variable names

    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    5a11b75 View commit details
    Browse the repository at this point in the history
  8. swap arguments to commutative ops to be the same as in `forward_batch…

    …_wo_cache_flash_attn`
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    b2f1310 View commit details
    Browse the repository at this point in the history
  9. add input tensors as checkpoints

    so that recursive tensor cloning of gradient checkpointing terminates on input tensors
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    5884b43 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    9716eb8 View commit details
    Browse the repository at this point in the history
  11. make sure some tensors are not reallocated by inserting new temporary…

    … nodes depending on them:
    
    output and parameter gradient tensors need to be available at the end of the graph execution
    
    parameter gradient tensors also need to be available before the graph execution because they are set to zero before each optimizer iteration
    
    checkpoint tensors are allocated all together to reduce memory allocator fragmentation
    
    afterwards, in addition to the temporary nodes, we also need to reset the temporary leafs
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    38f4438 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    d6c5b03 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    4ed096c View commit details
    Browse the repository at this point in the history
  14. integrate unified training function which may use memory allocator

    the unified training function also supports arguments whether to use flash attention and/or gradient checkpointing
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    865c4cd View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    3e99a8d View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    75baed2 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    fe788a1 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    c954f41 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    271e4d6 View commit details
    Browse the repository at this point in the history
  20. remove trailing whitespace

    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    6f161c7 View commit details
    Browse the repository at this point in the history
  21. remove unused train params: mem_compute1_gb & mem_compute2_gb

    mem_compute_gb is used for compute when automatic memory allocator is not enabled, otherwise it can be very small to only hold the tensor definitions
    mem_compute0_gb is used for automatic memory allocator (as long as measurement of max required size is not implemented)
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    3794dce View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    6e280b2 View commit details
    Browse the repository at this point in the history
  23. add debug asserts in ggml_allocr_alloc to some common pitfalls when u…

    …sing this function directly
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    faf3e21 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    098654c View commit details
    Browse the repository at this point in the history
  25. fix test when to create temporary backward graph

    temporary backward graph is only necessary when using checkpointing
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    3e6468b View commit details
    Browse the repository at this point in the history
  26. fix memory "leak" in optimizers

    each iteration a new cplan with new memory for work data was allocated.
    now cplan creation only happens at the start of optimization, with each iteration reusing the cplan and its work data.
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    5622846 View commit details
    Browse the repository at this point in the history
  27. reverse order of for loop in ggml_build_backward_expand to save memor…

    …y when using gradient checkpointing and allocator
    
    with this loop order gradient checkpointing with allocator on 16 layer model saves 13% memory; 2 layer memory it saves 2% memory.
    
    the computation results are the same
    xaedes committed Aug 14, 2023
    Configuration menu
    Copy the full SHA
    3b5515b View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2023

  1. Configuration menu
    Copy the full SHA
    316b070 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5e059ac View commit details
    Browse the repository at this point in the history
  3. move and remove code

    xaedes committed Aug 15, 2023
    Configuration menu
    Copy the full SHA
    9eb1ef8 View commit details
    Browse the repository at this point in the history

Commits on Aug 16, 2023

  1. add API functions to access remaining model parameters:

    mult, head and rot
    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    c0a372f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    28ee0c8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    50b1e66 View commit details
    Browse the repository at this point in the history
  4. bug fixes to make finetune compile

    automatic allocator does not work yet
    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    be7e564 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6202753 View commit details
    Browse the repository at this point in the history
  6. fix names of lora tensors

    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    0ab2507 View commit details
    Browse the repository at this point in the history
  7. avoid stack overflow resulting from big ggml_cgraph

    replace stack allocation and ggml_build_forward by ggml_new_graph in combination with ggml_build_forward_expand
    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    39a2d15 View commit details
    Browse the repository at this point in the history
  8. replace llama API functions to get model tensors by one function to g…

    …et model tensor by name
    
    LLAMA_API struct ggml_tensor * llama_get_model_tensor(struct llama_model * model, const char * name);
    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    1151653 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    79ad888 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    83cb9ed View commit details
    Browse the repository at this point in the history
  11. remove trailing whitespace

    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    83a4ad7 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    f80e245 View commit details
    Browse the repository at this point in the history
  13. add ggml_add_cast API function

    this function works like ggml_add, but accepts a data type for the resulting tensor.
    only supported for quantized src0 input.
    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    9198b24 View commit details
    Browse the repository at this point in the history
  14. use ggml_add_cast in finetuning

    lora-applied weights will now have data type F32, which improves gradients when finetuning quantized base models
    xaedes committed Aug 16, 2023
    Configuration menu
    Copy the full SHA
    714fec0 View commit details
    Browse the repository at this point in the history

Commits on Aug 17, 2023

  1. Configuration menu
    Copy the full SHA
    0bb897c View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2023

  1. make sure base model tensors data cannot be used in viewable operations

    memory allocator would try to make lora application inplace on base model tensors.
    since those are memory mapped this will result in memory access violations
    xaedes committed Aug 18, 2023
    Configuration menu
    Copy the full SHA
    44526cb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    a252111 View commit details
    Browse the repository at this point in the history
  3. avoid keeping in memory ALL of the gradients

    The problem here stems from ggml_graph_reset. This function is called in the optimization function, before each graph computation, to reset the gradients to zero. This required a unique memory slot for each gradient: allocating memory from a previosly freed memory location might lead to non-zero input gradients.
    
    During ggml_compute_backward the gradients are build stepwise by adding or substracting new values, starting from a OP_NONE tensor which needs to contain zero-values. This requires the graph reset.
    
    To avoid this I now remember in ggml_build_backward_expand the original OP_NONE gradient tensors in a hash table, which is passed to ggml_compute_backward. There instead of using add (or sub or similar) I test whether the existing gradient to be changed is a zero-valued-tensor by looking up its existence in the hash table. When it is such a zero-tensor it will not be modified, but replaced by the value to be added, otherwise the regular add (not inplace, allocator will take care of this) will be used. This way none of those zero-tensor values will be necessary in the final backward graph and more importantly they won't need a unique memory slot, just to make them zero.
    xaedes committed Aug 18, 2023
    Configuration menu
    Copy the full SHA
    f358204 View commit details
    Browse the repository at this point in the history
  4. remove trailing whitespace

    xaedes committed Aug 18, 2023
    Configuration menu
    Copy the full SHA
    011f47f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a0c2752 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    113c90f View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    7a63d42 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    63cb374 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6c98640 View commit details
    Browse the repository at this point in the history
  10. remove unnecessary src tensor from ggml_get_rows_back

    we don't need data of src[2] for computation, only to setup the correct output shape.
    remove dependency on src[2], so that allocator can work more freely.
    
    the computational graph is still completely determined, because the output shape is naturally included.
    this is similar to how ggml_reshape does it.
    xaedes committed Aug 18, 2023
    Configuration menu
    Copy the full SHA
    65b0561 View commit details
    Browse the repository at this point in the history
  11. remove unnecessary src tensor from ggml_repeat & ggml_repeat_back

    we don't need data of src[1] for computation, only to setup the correct output shape.
    remove dependency on src[1], so that allocator can work more freely.
    
    the computational graph is still completely determined, because the output shape is naturally included
    xaedes committed Aug 18, 2023
    Configuration menu
    Copy the full SHA
    3e47890 View commit details
    Browse the repository at this point in the history
  12. resolve todo

    allocator will only make it inplace when they are of the same type
    xaedes committed Aug 18, 2023
    Configuration menu
    Copy the full SHA
    37dfb54 View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2023

  1. mixing multiple LORA adapters is now possible

    pass more than one '--lora FNAME' argument to apply more than one LORA.
    use '--lora-scaled FNAME S' when you want to specify a user-defined scale for an adapter.
    xaedes committed Aug 20, 2023
    Configuration menu
    Copy the full SHA
    d61ed6b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    27c24ff View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2023

  1. also save latest finetune output with ITERATION="LATEST" and print wh…

    …ere files are saved
    
    saving with LATEST makes it easier to resume training from the latest checkpoint
    the string "LATEST" can be configured with command line option "--fn-latest STR"
    xaedes committed Aug 21, 2023
    Configuration menu
    Copy the full SHA
    8b4106a View commit details
    Browse the repository at this point in the history

Commits on Aug 23, 2023

  1. Configuration menu
    Copy the full SHA
    77a3092 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1a5f0a3 View commit details
    Browse the repository at this point in the history
  3. update finetune README

    xaedes committed Aug 23, 2023
    Configuration menu
    Copy the full SHA
    7df517c View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2023

  1. Merge branch 'master' into finetune-lora

    # Conflicts:
    #	examples/CMakeLists.txt
    #	examples/train-text-from-scratch/train-text-from-scratch.cpp
    #	ggml.c
    #	llama.cpp
    #	llama.h
    xaedes committed Aug 28, 2023
    Configuration menu
    Copy the full SHA
    b04263c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    aecc3b3 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    aa8016e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    daedc6f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    5ce92ae View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    271c030 View commit details
    Browse the repository at this point in the history
  7. reduce large memory overhead in train-text-from-scratch

    all gradients had to be pinned so that graph_reset works correctly.
    this is no longer necessary with the changes to ggml_compute_backward introduced in this PR.
    xaedes committed Aug 28, 2023
    Configuration menu
    Copy the full SHA
    9a28bce View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    49af7fb View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    007280c View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    1faee64 View commit details
    Browse the repository at this point in the history
  11. remove unused code

    xaedes committed Aug 28, 2023
    Configuration menu
    Copy the full SHA
    a3b4529 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    ca97583 View commit details
    Browse the repository at this point in the history
  13. add LLM_KV_TRAINING_TYPE to train-text-from-scratch checkpoints

    so that they can be differentiated from lora finetune checkpoints
    xaedes committed Aug 28, 2023
    Configuration menu
    Copy the full SHA
    e030f7b View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    ecb1b20 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2023

  1. Configuration menu
    Copy the full SHA
    0564f4e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6134ad4 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1425968 View commit details
    Browse the repository at this point in the history
  4. remove code to print data checksums which was used to verify correctn…

    …ess of new gguf code
    xaedes committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    ebff3a1 View commit details
    Browse the repository at this point in the history
  5. omit tokenization when training is disabled, only save llama lora ada…

    …pter
    
    training can be disabled by passing '-n 0' to finetune
    xaedes committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    5813ac8 View commit details
    Browse the repository at this point in the history
  6. remove trailing whitespace

    xaedes committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    a6165da View commit details
    Browse the repository at this point in the history
  7. update README.md

    xaedes committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    e28cf7e View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    794bb7e View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    5f0a4e9 View commit details
    Browse the repository at this point in the history
  10. add ggml API functions ggml_unravel_index, ggml_get_i32_nd and its an…

    …alogs for set and for f32
    
    ggml_get_i32_1d, ggml_set_i32_1d, ggml_get_f32_1d, ggml_set_f32_1d now support non-contiguous tensors.
    in case of non-contiguous tensor, the 1d index is unraveled into a multi index using ggml_unravel_index to be passed to '_nd' function equivalent.
    
    this fixes a bug in test-grad0 which happens due to ggml_build_backward not building purely contiguous tensors anymore
    xaedes committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    82c5247 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    5fcfa7e View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    b1aa26f View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    a76e66a View commit details
    Browse the repository at this point in the history
  14. remove unused 'inplace' argument from ggml_compute_backward function

    inplace operations to add gradients are no longer created by ggml_compute_backward
    use allocator to automatically make inplace operations
    xaedes committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    dd4e4bc View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    8a96d4c View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    281245a View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    5854f51 View commit details
    Browse the repository at this point in the history
  18. fix check_gradient

    ggml_build_backward_expand was previously replaced by ggml_build_backward, but the assignment of forward graph to backward graph missing
    xaedes committed Aug 29, 2023
    Configuration menu
    Copy the full SHA
    bf70e27 View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2023

  1. Configuration menu
    Copy the full SHA
    b1709f2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2392b67 View commit details
    Browse the repository at this point in the history
  3. move gradient checkpointing code into ggml, new API function:

    // build gradient checkpointing backward graph gb for gf using provided checkpoints
    // gb_tmp will contain original backward graph with rewritten backward process nodes,
    // but without the second forward pass nodes.
    GGML_API void ggml_build_backward_gradient_checkpointing(
            struct ggml_context   * ctx,
            struct ggml_cgraph    * gf,
            struct ggml_cgraph    * gb,
            struct ggml_cgraph    * gb_tmp,
            struct ggml_tensor  * * checkpoints,
            int                     n_checkpoints);
    xaedes committed Aug 30, 2023
    Configuration menu
    Copy the full SHA
    d487e05 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e6b7158 View commit details
    Browse the repository at this point in the history
  5. train-text-from-scratch can train (full finetune) gguf models

    just pass the gguf model via `--checkpoint-in FN`.
    after this, to continue training, pass the generated checkpoint instead of the original gguf model.
    
    tested with smaller models, bigger models may exceed available memory.
    use (LORA) finetune for those.
    xaedes committed Aug 30, 2023
    Configuration menu
    Copy the full SHA
    fc456ed View commit details
    Browse the repository at this point in the history
  6. remove trailing whitespace

    xaedes committed Aug 30, 2023
    Configuration menu
    Copy the full SHA
    f3590ad View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    b26bd4c View commit details
    Browse the repository at this point in the history
  8. update README.md

    xaedes committed Aug 30, 2023
    Configuration menu
    Copy the full SHA
    4e986ac View commit details
    Browse the repository at this point in the history
  9. fix warnings

    xaedes committed Aug 30, 2023
    Configuration menu
    Copy the full SHA
    0c57f9f View commit details
    Browse the repository at this point in the history
  10. fix warnings

    xaedes committed Aug 30, 2023
    Configuration menu
    Copy the full SHA
    4fd51c4 View commit details
    Browse the repository at this point in the history

Commits on Aug 31, 2023

  1. remove finetune option to disable allocator

    the allocator should always be used.
    by making sure that it is always used it gets easier to implement automatic memory requirements computation
    xaedes committed Aug 31, 2023
    Configuration menu
    Copy the full SHA
    e0da168 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4914f85 View commit details
    Browse the repository at this point in the history

Commits on Sep 1, 2023

  1. Configuration menu
    Copy the full SHA
    d554a70 View commit details
    Browse the repository at this point in the history
  2. add ggml-alloc API function 'ggml_allocr_max_size' to get max size of…

    … alloc
    
    GGML_API size_t ggml_allocr_max_size(struct ggml_allocr * alloc);
    xaedes committed Sep 1, 2023
    Configuration menu
    Copy the full SHA
    7e01d11 View commit details
    Browse the repository at this point in the history
  3. finetune: automatically allocate all memory and changes to command li…

    …ne options
    
    remove '--n_examples N' parameter, as it no longer makes sense to call optimization process multiple times in a loop.
    add '--only_write_lora' command line option: will skip tokenization and training, to only write a llama.cpp comptabile LORA adapter.
    remove memory buffer related command line options.
    improve iteration console output.
    xaedes committed Sep 1, 2023
    Configuration menu
    Copy the full SHA
    5bba329 View commit details
    Browse the repository at this point in the history
  4. add finetune to Makefile

    xaedes committed Sep 1, 2023
    Configuration menu
    Copy the full SHA
    6cbf55a View commit details
    Browse the repository at this point in the history
  5. update README.md

    xaedes committed Sep 1, 2023
    Configuration menu
    Copy the full SHA
    7acb124 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'master' into finetune-lora

    # Conflicts:
    #	Makefile
    xaedes committed Sep 1, 2023
    Configuration menu
    Copy the full SHA
    6809eb7 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    c32ad44 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2023

  1. increase measured alloc size by tensor_alignment

    ggml_allocr_reset will reduce the given size by up to tensor_alignment-1
    xaedes committed Sep 2, 2023
    Configuration menu
    Copy the full SHA
    6ee12b1 View commit details
    Browse the repository at this point in the history
  2. fix README.md

    xaedes committed Sep 2, 2023
    Configuration menu
    Copy the full SHA
    cfe217f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ded6382 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8d982c8 View commit details
    Browse the repository at this point in the history
  5. revert last commit

    "bug fix, probably solves the 'ggml_allocr_alloc: not enough space in the buffer' issue"
    
    "alloc was freeing an externally allocated tensor, because it calculated the end of allocator memory as alloc->data + alloc->max_size instead of alloc->data + alloc->size."
    
    This is intentional to reduce the risk of freeing external tensors when measuring. Unless max_size is not properly calculated, I don't see why this is an issue.
    xaedes committed Sep 2, 2023
    Configuration menu
    Copy the full SHA
    1ce7023 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    2d2bdc0 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    80ac697 View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2023

  1. update README.md

    xaedes committed Sep 3, 2023
    Configuration menu
    Copy the full SHA
    406e075 View commit details
    Browse the repository at this point in the history
  2. fix printf format warnings

    xaedes committed Sep 3, 2023
    Configuration menu
    Copy the full SHA
    e07f5c5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    bdb7092 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    50589ed View commit details
    Browse the repository at this point in the history

Commits on Sep 4, 2023

  1. Merge branch 'master' into finetune-lora

    # Conflicts:
    #	ggml-alloc.c
    xaedes committed Sep 4, 2023
    Configuration menu
    Copy the full SHA
    9ea2f7f View commit details
    Browse the repository at this point in the history
  2. Merge branch 'master' into finetune-lora

    # Conflicts:
    #	Makefile
    xaedes committed Sep 4, 2023
    Configuration menu
    Copy the full SHA
    d3afd71 View commit details
    Browse the repository at this point in the history
  3. add gradient accumulation

    specify number accumulation steps with '--grad-acc N'.
    this will simulate a bigger batch size of grad_acc*batch.
    xaedes committed Sep 4, 2023
    Configuration menu
    Copy the full SHA
    c1c3b0e View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2023

  1. Configuration menu
    Copy the full SHA
    d07b6aa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    786e786 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d375b8f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    867e7c2 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2023

  1. improve finetune time measurement

    fix printf warnings on system where int64_t is (long int).
    change time datatypes to double because values get big with long training times.
    exclude file saving from time measurement.
    converge faster to actual time per iteration by removing very small first duration before first iteration was performed.
    fix bug in output of total training time, the reported value was 1000 times to small.
    xaedes committed Sep 6, 2023
    Configuration menu
    Copy the full SHA
    8c2d7e3 View commit details
    Browse the repository at this point in the history
  2. specify default lora rank with '--lora-r N'

    '--lora-r N' will specify default rank for all tensors
    '--rank-wq N', etc. will override this default rank for specific tensor types.
    xaedes committed Sep 6, 2023
    Configuration menu
    Copy the full SHA
    c08fcf5 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'master' into finetune-lora

    # Conflicts:
    #	common/common.cpp
    xaedes committed Sep 6, 2023
    Configuration menu
    Copy the full SHA
    0393116 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    de6170d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0c2c9c7 View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2023

  1. support grouped-query-attention in ggml_flash_attn and ggml_flash_att…

    …n_back
    
    k and v can now be repeated in q along ne[2]
    
    in forward pass just use modulo to compute k and v indices, like ik2 = iq2 % nek2.
    
    in backard pass this won't work as easy, because multiple threads will compete to accumulate to the same k->grad[:,ik1,ik2,ik3] and v->grad[:,iv1,iv2,iv3].
    so we change the parallelization over q rows to be over k rows. this ensures non-overlapping (ik2,ik3) across threads.
    in each thread we then iterate over the number of repetitions of k/v in q to compute iq2 as iq2 = ik2 + irep*nek2.
    
    since ne2 is not the same for q,k and v we also change how the gradients are concatenated into the result tensor.
    additionally the offsets of gradq, gradk and gradv in the result tensor are now memory aligned.
    
    we also simplify the compute_backward part of flash_attn to use ggml_reshape instead of switching over the number of dimensions.
    this needs a small change to ggml_reshape, removing the assertion of second argument to be contiguous.
    since only the shape (ne) of the second reshape argument is of relevance, its memory layout (nb) is irrelevant -> it can very well be non-contiguous.
    
    change test-grad0 to also test for repeated k/v in q.
    
    this changes the rng and now results in small gradient differences in softmax. these solely come from using f16 exp table lookup in forward softmax: when temporarily changing softmax to use actual exp function, the reported gradient differences go away. gradient differences coming solely from f16 table lookup are acceptable.
    added a note to explain this.
    xaedes committed Sep 9, 2023
    Configuration menu
    Copy the full SHA
    d7aade7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    833a56c View commit details
    Browse the repository at this point in the history
  3. fix finetune to support grouped-query-attention (using flash-attention)

    note: ggml changes to ggml_out_prod are necessary to support grouped-query-attention without flash-attention.
    xaedes committed Sep 9, 2023
    Configuration menu
    Copy the full SHA
    35260f7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    aea8b6b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    dd32786 View commit details
    Browse the repository at this point in the history
  6. decouple random number generator of each operation test

    when changing one test the rng of others tests is not influenced anymore
    xaedes committed Sep 9, 2023
    Configuration menu
    Copy the full SHA
    9738526 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    d3aaf08 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    d3f1b43 View commit details
    Browse the repository at this point in the history
  9. add cgraph evaluation order member and corresponding enum type

    this controls in which order ggml_build_forward visits source nodes.
    by default the nodes are visited left to right, i.e. src[0] first.
    in some cases it is beneficial for ggml-alloc to visit in a different order.
    two possible orders are supported: left-to-right (src[0] first) and right-to-left (src[0] last).
    xaedes committed Sep 9, 2023
    Configuration menu
    Copy the full SHA
    917d287 View commit details
    Browse the repository at this point in the history
  10. measure max compute size for each cgraph eval order and use best order

    this can bring huge memory savings:
    e.g. codellama-34b with n_ctx=64, n_batch=1 goes from 92927.8mb down to 4627.6 MB
    xaedes committed Sep 9, 2023
    Configuration menu
    Copy the full SHA
    ace9088 View commit details
    Browse the repository at this point in the history
  11. Merge branch 'master' into finetune-lora

    # Conflicts:
    #	examples/train-text-from-scratch/train-text-from-scratch.cpp
    #	llama.h
    xaedes committed Sep 9, 2023
    Configuration menu
    Copy the full SHA
    54b21a3 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    1cef459 View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2023

  1. Configuration menu
    Copy the full SHA
    0e32932 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7898652 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ec57689 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    7f378a7 View commit details
    Browse the repository at this point in the history

Commits on Sep 14, 2023

  1. Configuration menu
    Copy the full SHA
    f627e2f View commit details
    Browse the repository at this point in the history
  2. account for possible leading whitespace that will be added by tokenizer

    e.g. '\t' will be tokenized by llama spm tokenizer to [29871, 12]
    xaedes committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    2c59f7b View commit details
    Browse the repository at this point in the history
  3. use unrolled vec_mad in out_prod

    y is vec_mad result vec.
    x is vec_mad input vec.
    v is vec_mad input scalar.
    
    ggml_vec_mad_f32_unroll will internally loop over x and v with same y.
    
    GGML_VEC_MAD_UNROLL is by default defined to 32.
    
    This value is empirical optimized using performance test runs of out-prod in openllama-3b finetune with 256 context length and batch size 1. It gives 23% performance boost for out_prod.
    
    Full measurements of out-prod runtime in ms:
    	unroll_xv	unroll_yv
    1	67014.643	87826.469
    2	77117.552	89077.656
    4	72091.311	109121.657
    8	61077.543	88678.334
    16	56914.67	79514.947
    24	59024.595	84350.254
    28	55952.446	83368.73
    32	51476.658	85177.745
    36	55973.792	84659.92
    40	55139.616	93844.738
    48	60736.392	93330.267
    64	99856.878	116994.99
    
    Second column is when unrollying yv instead of xv
    xaedes committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    20cf1a4 View commit details
    Browse the repository at this point in the history
  4. set lora_alpha to value of lora_r if it is not set via command line

    otherwise only changing lora_r will change scaling of lora adapter used in prediction
    xaedes committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    3a9c1d7 View commit details
    Browse the repository at this point in the history
  5. reshuffle original sample order instead of the previous shuffled order

    otherwise resumed reshuffle will not result in same sample order
    xaedes committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    0971fee View commit details
    Browse the repository at this point in the history
  6. block tiling for out-prod inspired by mul-mat

    block sizes are empirically optimized
    
    roughly doubles the flops of out-prod
    xaedes committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    d88dae2 View commit details
    Browse the repository at this point in the history
  7. exclude some more known zero values from computations in flash_attn_f…

    …32 & flash_attn_back_f32
    xaedes committed Sep 14, 2023
    Configuration menu
    Copy the full SHA
    76804fa View commit details
    Browse the repository at this point in the history

Commits on Sep 15, 2023

  1. add static keywords

    xaedes committed Sep 15, 2023
    Configuration menu
    Copy the full SHA
    4f2ce91 View commit details
    Browse the repository at this point in the history
  2. remove outcommented old code

    xaedes committed Sep 15, 2023
    Configuration menu
    Copy the full SHA
    cc60b3f View commit details
    Browse the repository at this point in the history
  3. update train-text-from-scratch with tokenization, sample selection an…

    …d shuffling from finetune
    xaedes committed Sep 15, 2023
    Configuration menu
    Copy the full SHA
    ab56b63 View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2023

  1. Configuration menu
    Copy the full SHA
    00b656f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9f4b1bf View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a8c8907 View commit details
    Browse the repository at this point in the history
  4. move train data saving code into callback to unify code of opt_callback

    train_params are still different in finetune and train-text-from-scratch, so it can't yet be moved to train.h|cpp
    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    ee27333 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    e9758ae View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    bef1e97 View commit details
    Browse the repository at this point in the history
  7. fix consume_common_train_arg

    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    7aa9ea7 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    48d3509 View commit details
    Browse the repository at this point in the history
  9. increase train_samples by used_samples instead of number of batches

    on batch can contain more than one sample when option "fill_with_next_samples" is used
    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    571dc94 View commit details
    Browse the repository at this point in the history
  10. Merge branch 'master' into finetune-lora

    # Conflicts:
    #	Makefile
    #	examples/baby-llama/baby-llama.cpp
    #	examples/train-text-from-scratch/train-text-from-scratch.cpp
    #	llama.cpp
    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    d3e06d3 View commit details
    Browse the repository at this point in the history
  11. fix usage of llama_tokenize

    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    7930caf View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    8d82d4c View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    9139fec View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    1d33ec5 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    1d09965 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    9db2664 View commit details
    Browse the repository at this point in the history
  17. remove terminating '\0' from tokenization

    (llama_tokenize is now passed the string length instead of relying on terminating '\0')
    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    dd3e763 View commit details
    Browse the repository at this point in the history
  18. fix compile warnings

    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    83061fb View commit details
    Browse the repository at this point in the history
  19. fix compile warnings

    xaedes committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    8721785 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2023

  1. use new/delete for train_state instead of malloc/free

    using malloc may result in seg faults when trying to assign string fields
    xaedes committed Sep 17, 2023
    Configuration menu
    Copy the full SHA
    ddf5ac2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    151bfe9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    bf2ad65 View commit details
    Browse the repository at this point in the history
  4. add train option "--sample-random-offsets"

    Use samples beginning at random offsets.
    The offset is only applied to the first sample in each batch context window.
    Together with "--fill-with-next-samples" this may help for training endless text generation.
    
    For example given a dataset containing samples "abcd", "ABCD", "0123".
    With context size of 8 and options "--fill-with-next-samples", "--no-separate-with-eos", "--no-separate-with-bos",
    the context windows of batches could only be filled with "abcdABCD", "ABCDabcd", "0123abcd", etc.
    
    With "--sample-random-offsets" it can also be filled with "23abcdAB", "bcd0123A", etc.
    xaedes committed Sep 17, 2023
    Configuration menu
    Copy the full SHA
    d1bb6fb View commit details
    Browse the repository at this point in the history
  5. deduplicate code into function

    xaedes committed Sep 17, 2023
    Configuration menu
    Copy the full SHA
    56a03fa View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    1dbd6bc View commit details
    Browse the repository at this point in the history
  7. align code

    xaedes committed Sep 17, 2023
    Configuration menu
    Copy the full SHA
    5ed3098 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b0ee563 View commit details
    Browse the repository at this point in the history
  9. move some params from lora hparams into model hparams and load model …

    …params from gguf
    
    this equalizes the model definition in finetune and text-from-scratch and removes the need for additional llama api functions to get model parameters
    xaedes committed Sep 17, 2023
    Configuration menu
    Copy the full SHA
    934ad8d View commit details
    Browse the repository at this point in the history
  10. remove now unnecessary llama API functions to get model params that w…

    …here added by this PR
    xaedes committed Sep 17, 2023
    Configuration menu
    Copy the full SHA
    dd94ce4 View commit details
    Browse the repository at this point in the history
  11. train-text-from-scratch: automatically allocate model tensors, remove…

    … option '--mem-model N'
    xaedes committed Sep 17, 2023
    Configuration menu
    Copy the full SHA
    9e10fa9 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    db38d2b View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    f9b5d9b View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    c993246 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    3b9d974 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    5ce74ee View commit details
    Browse the repository at this point in the history

Commits on Sep 22, 2023

  1. add export-lora program

    xaedes committed Sep 22, 2023
    Configuration menu
    Copy the full SHA
    0ede0f4 View commit details
    Browse the repository at this point in the history
  2. remove trailing whitespace

    xaedes committed Sep 22, 2023
    Configuration menu
    Copy the full SHA
    b91e3dd View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d38260b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    904c19b View commit details
    Browse the repository at this point in the history
  5. add export-lora build dependency to llama

    because it depends on common, which depends on llama
    xaedes committed Sep 22, 2023
    Configuration menu
    Copy the full SHA
    758c46c View commit details
    Browse the repository at this point in the history
  6. update finetune README.md

    xaedes committed Sep 22, 2023
    Configuration menu
    Copy the full SHA
    9145c87 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    da05205 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2023

  1. improve handling of export-lora arguments

    print errors and warnings when files could not be read or created
    xaedes committed Sep 24, 2023
    Configuration menu
    Copy the full SHA
    2912f17 View commit details
    Browse the repository at this point in the history
  2. Fix export-lora.cpp "not enough space in the context's memory pool" (#1)

    * Fix export-lora.cpp "not enough space in the context's memory pool"
    
    Without this patch, export-lora would sometimes error with "not enough space in the context's memory pool (needed 656784, available 656800)".
    
    * increase required context size by 5*GGML_MEM_ALIGN instead of plain 16
    
    ---------
    
    Co-authored-by: xaedes <xaedes@gmail.com>
    meatbag-18a and xaedes authored Sep 24, 2023
    Configuration menu
    Copy the full SHA
    ad64e33 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1660658 View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2023

  1. Configuration menu
    Copy the full SHA
    5461129 View commit details
    Browse the repository at this point in the history