is:open Slowllama is quite interesting. Plz elaborate what slowllama does which autotrain cant't ? #6

rsjenwar · 2023-10-15T14:47:28Z

rsjenwar
Oct 15, 2023

is:open Slowllama is quite interesting. Plz elaborate what slowllama does which autotrain cant't ?

okuvshynov · 2023-10-16T15:49:52Z

okuvshynov
Oct 16, 2023
Maintainer

Do you refer to https://huggingface.co/docs/autotrain/index ?

I haven't tried it, but autotrain seems to be hosted e2e solution for training/finetuning various classes of models. It is most definitely more powerful/faster/paid per hour.

slowllama is a standalone tool/library which can be used locally on somewhat slower devices (e.g. mac mini). The original use-case I was thinking about was:

NOT one-off finetuning you run on a static data - you can rent a multi-GPU instance for that if it is indeed one-off. Maybe autotrain is a good option for that as well, not sure
obviously not research/heavy experimentation - it's too slow
more like part of a product which involves finetuning/inference locally. For example, you have a set of documents/code you are actively working on and you'd like to update LLM to 'know' what's inside it. Rather than indexing it to vector store, you could write an app to finetune llm of choice on that data every night.

1 reply

rsjenwar Oct 17, 2023
Author

Thanks for your reply. Now, i totally understand the purpose, Maybe autotrain can also work on local machine. But that's not the purpose. I was also trying "slowllama" on Macbook Pro M1 16 GB but was hitting error.

-MacBook-Pro slowllama % python3 test_gen.py ../llama7b mps
2023-10-16 16:42:57,130 loading sequential model from ../llama7b_f16
2023-10-16 16:42:57,130 creating model instance
2023-10-16 16:42:59,242 created transformer block 0
2023-10-16 16:42:59,817 created transformer block 1
2023-10-16 16:43:00,384 created transformer block 2
2023-10-16 16:43:00,949 created transformer block 3
2023-10-16 16:43:01,521 created transformer block 4
2023-10-16 16:43:02,089 created transformer block 5
2023-10-16 16:43:02,656 created transformer block 6
2023-10-16 16:43:03,222 created transformer block 7
2023-10-16 16:43:03,788 created transformer block 8
2023-10-16 16:43:04,356 created transformer block 9
2023-10-16 16:43:04,923 created transformer block 10
2023-10-16 16:43:05,491 created transformer block 11
2023-10-16 16:43:06,057 created transformer block 12
2023-10-16 16:43:06,624 created transformer block 13
2023-10-16 16:43:07,192 created transformer block 14
2023-10-16 16:43:07,760 created transformer block 15
2023-10-16 16:43:08,330 created transformer block 16
2023-10-16 16:43:08,899 created transformer block 17
2023-10-16 16:43:09,465 created transformer block 18
2023-10-16 16:43:10,052 created transformer block 19
2023-10-16 16:43:10,619 created transformer block 20
2023-10-16 16:43:11,189 created transformer block 21
2023-10-16 16:43:11,757 created transformer block 22
2023-10-16 16:43:12,324 created transformer block 23
2023-10-16 16:43:12,889 created transformer block 24
2023-10-16 16:43:13,456 created transformer block 25
2023-10-16 16:43:14,021 created transformer block 26
2023-10-16 16:43:14,588 created transformer block 27
2023-10-16 16:43:15,155 created transformer block 28
2023-10-16 16:43:15,722 created transformer block 29
2023-10-16 16:43:16,289 created transformer block 30
2023-10-16 16:43:16,855 created transformer block 31
2023-10-16 16:43:17,241 loading model dict
Traceback (most recent call last):
File "/Users/rsj/slowllama/test_gen.py", line 19, in
logging.debug(model.load_state_dict(torch.load(lora_weights), strict=False))
File "/Users/rsj/Library/Python/3.9/lib/python/site-packages/torch/serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, pickle_load_args)
File "/Users/rsj/Library/Python/3.9/lib/python/site-packages/torch/serialization.py", line 1246, in _legacy_load
magic_number = pickle_module.load(f, pickle_load_args)
EOFError: Ran out of input

However, "python prepare_model.py" and "python finetune.py" running fine. But now able run default and finedtuned model. The commands "python test_gen.py ../llama7b mps" and "python test_gen.py ../llama7b mps ./out/state_dict_19.pth" are failing. What can be wrong here ? Please help.

okuvshynov · 2023-10-17T12:47:22Z

okuvshynov
Oct 17, 2023
Maintainer

Oh sorry I forgot to update the instructions in one place - now configuration is in conf* files rather than passed as an argument, please try running it like
python test_gen.py (original) and python test_gen.py ./out/state_dict_19.pth.

0 replies

rsjenwar · 2023-10-19T01:27:24Z

rsjenwar
Oct 19, 2023
Author

Thank you, now command runs.

But for you finetuning of 20 iteration is happening in 20 mins and for me it is going on for an hour and failing but output trained models for few iteration are available in out directory.

slowllama % python3 finetune.py
/Users/rsj/Library/Python/3.9/lib/python/site-packages/urllib3/init.py:34: NotOpenSSLWarning: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: urllib3/urllib3#3020
warnings.warn(
Traceback (most recent call last):
File "/Users/rsj/slowllama/finetune.py", line 46, in
loss = model.manual_loop(X, y)
File "/Users/rsj/slowllama/blackbox_model.py", line 372, in manual_loop
last_grad = self.backprop_w_lora(layer.feed_forward, last_grad)
File "/Users/rsj/slowllama/blackbox_model.py", line 302, in backprop_w_lora
output.backward(output_grad)
File "/Users/rsj/Library/Python/3.9/lib/python/site-packages/torch/tensor.py", line 492, in backward
torch.autograd.backward(
File "/Users/rsj/Library/Python/3.9/lib/python/site-packages/torch/autograd/init.py", line 244, in backward
grad_tensors = make_grads(tensors, grad_tensors, is_grads_batched=False)
File "/Users/rsj/Library/Python/3.9/lib/python/site-packages/torch/autograd/init.py", line 88, in _make_grads
raise RuntimeError(
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([6, 1024, 4096]) and output[0] has a shape of torch.Size([1, 11, 4096]).
rsj@Rajeshwars-MacBook-Pro slowllama %

slowllama % ls -al out/
total 41600
drwxr-xr-x 7 rsj staff 224 Oct 18 23:21 .
drwxr-xr-x 28 rsj staff 896 Oct 19 05:17 ..
-rw-r--r-- 1 rsj staff 4254601 Oct 19 05:29 state_dict_1.pth
-rw-r--r-- 1 rsj staff 4254734 Oct 19 05:56 state_dict_10.pth
-rw-r--r-- 1 rsj staff 4254734 Oct 19 06:01 state_dict_12.pth
-rw-r--r-- 1 rsj staff 4254601 Oct 19 05:34 state_dict_4.pth
-rw-r--r-- 1 rsj staff 4254601 Oct 19 05:41 state_dict_8.pth
(llama2) rsj@Rajeshwars-MacBook-Pro slowllama %

Plus these trained models are not giving expected outcome, i trained 7B model:

slowllama % python3 test_gen.py ./out/state_dict_10.pth
2023-10-19 06:50:23,321 0 - Cubestat reports the following metrics:

Loss - the loss of the model after training on the given dataset.
Loss - the loss of
(llama2) rsj@Rajeshwars-MacBook-Pro slowllama %

Please help.

9 replies

okuvshynov Oct 19, 2023
Maintainer

oh yes, i think data file should be changed. Name 'llama70b_f16' shouldn't matter as long as you ran both prepare_model, finetune and test_gen with the same name. It might cause some confusion though. For cubestat file change seq_len to something much smaller (128 maybe), because entire file is very small

rsjenwar Oct 20, 2023
Author

Right,

I tried changing data file to 'test_data/cpustat.txt' and seq_len = 128. The results are better now from 6th iteration onwards to 12th iteration.

slowllama % python3 test_gen.py ./out/state_dict_6.pth

2023-10-20 08:48:21,531 0 - Cubestat reports the following metrics:

CPU utilization - configurable per core ('expanded'), cluster of cores: Efficiency/Performance ('cluster') or both

But still not complete as it is not telling about GPU utilisation.

okuvshynov Oct 20, 2023
Maintainer

Awesome, so it works in principle! I think first step could be to just increase max_new_tokens to generate more output: https://github.com/okuvshynov/slowllama/blob/main/test_gen.py#L25.

It is possible, however, that we did overfit (as dataset is too small). You can try tweaking it a little (smaller learning rate, check what the output would be after smaller number of iteration, reduce lora capacity https://github.com/okuvshynov/slowllama/blob/main/conf.py#L12.

rsjenwar Oct 20, 2023
Author

Right, thank you, will try and let you know.

I have one more question, how to fine tune with structured data in question, answer, context format in multiple files in a directory ?

okuvshynov Oct 20, 2023
Maintainer

I don't have much experience with that. Just to check that it can work, I tried databricks/databricks-dolly-15k dataset.

It's done in https://github.com/okuvshynov/slowllama/blob/main/finetune_dolly.py but that script needs an update as it doesn't use the config. Will try to do that soon.

Main part is sample formatting:

https://github.com/okuvshynov/slowllama/blob/main/finetune_dolly.py#L42-L51

I think you can also refer to data formatting OpenAI suggests: https://cookbook.openai.com/examples/chat_finetuning_data_prep.

okuvshynov · 2023-10-21T23:25:49Z

okuvshynov
Oct 21, 2023
Maintainer

I've just tried training with following overrides:

config:

seq_len = 128
batch_size = 16
finetune_file = './test_data/cubestat.txt'
llama2_model_path = '/Volumes/LLAMAS/llama-2-7b'
frozen_model_path = '../llama7b_f16'

test_gen:

greedy_gen(model, tokenizer, device, prompt, max_new_tokens=100)

I ran finetune for 20 iterations, but used the checkpoint 10 (where we are less likely to overfit):

python3 test_gen.py ./out/state_dict_10.pth

This was the output:

Cubestat reports the following metrics:

CPU utilization - configurable per core ('expanded'), cluster of cores: Efficiency/Performance ('cluster') or both. Is shown as percentage.
GPU utilization per card/chip. Is shown in percentage. Works for Apple's M1/M2 SoC and nVidia GPUs. For nVidia GPU shows memory usage as well.
ANE (Apple's Neural Engine) power consumption. According to man

0 replies

rsjenwar · 2023-10-31T07:40:33Z

rsjenwar
Oct 31, 2023
Author

Thank you, will this methods also works to finetune llama2 with lots of questions and answer database about a topic?

3 replies

okuvshynov Nov 2, 2023
Maintainer

Probably. If you could share a publicly available dataset with similar properties, I can try it out.

rsjenwar Nov 8, 2023
Author

These are FAQs of an organisation. I want to train llama2 on all FAQs and Answers in a .txt file, so that next time llama2 can answer a new variation of a question automatically.

okuvshynov Nov 11, 2023
Maintainer

Ok, I see - let me try find something similar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is:open Slowllama is quite interesting. Plz elaborate what slowllama does which autotrain cant't ? #6

{{title}}

Replies: 5 comments 13 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

is:open Slowllama is quite interesting. Plz elaborate what slowllama does which autotrain cant't ? #6

rsjenwar Oct 15, 2023

Replies: 5 comments · 13 replies

okuvshynov Oct 16, 2023 Maintainer

rsjenwar Oct 17, 2023 Author

okuvshynov Oct 17, 2023 Maintainer

rsjenwar Oct 19, 2023 Author

okuvshynov Oct 19, 2023 Maintainer

rsjenwar Oct 20, 2023 Author

okuvshynov Oct 20, 2023 Maintainer

rsjenwar Oct 20, 2023 Author

okuvshynov Oct 20, 2023 Maintainer

okuvshynov Oct 21, 2023 Maintainer

rsjenwar Oct 31, 2023 Author

okuvshynov Nov 2, 2023 Maintainer

rsjenwar Nov 8, 2023 Author

okuvshynov Nov 11, 2023 Maintainer

rsjenwar
Oct 15, 2023

Replies: 5 comments 13 replies

okuvshynov
Oct 16, 2023
Maintainer

rsjenwar Oct 17, 2023
Author

okuvshynov
Oct 17, 2023
Maintainer

rsjenwar
Oct 19, 2023
Author

okuvshynov Oct 19, 2023
Maintainer

rsjenwar Oct 20, 2023
Author

okuvshynov Oct 20, 2023
Maintainer

rsjenwar Oct 20, 2023
Author

okuvshynov Oct 20, 2023
Maintainer

okuvshynov
Oct 21, 2023
Maintainer

rsjenwar
Oct 31, 2023
Author

okuvshynov Nov 2, 2023
Maintainer

rsjenwar Nov 8, 2023
Author

okuvshynov Nov 11, 2023
Maintainer