move to lit gpt llama impl #3

samsja · 2024-07-16T16:47:39Z

add llama

Jackmin801 · 2024-07-30T18:51:57Z

open_diloco/train_fsdp.py

+def get_model(config: Config) -> GPT:
    # Load model
-    config_model = LlamaConfig.from_pretrained(config.path_model, attn_implementation=config.attn_implementation)
-    return LlamaForCausalLM.from_pretrained(pretrained_model_name_or_path=config.path_model, config=config_model)
+    if isinstance(config.llama_config, ModelConfig):
+        llama_config = config.llama_config
+    else:
+        with open(config.llama_config) as f:
+            llama_config = ModelConfig(**json.load(f))
+
+    llama_config.attention_impl = config.attention_impl
+    return GPT(llama_config)


Hrmm would the initialisation be the same across diloco workers?

yes indeed, is there a way to use hf hub to push the init ckpt like we did before ?

seems like GPT inherits from nn.Module instead of transformers PreTrainedModel so it doesnt have the from_pretrained function. Maybe we can change it to allow loading from hub

Jackmin801 · 2024-07-30T18:55:20Z

open_diloco/train_fsdp.py

+            input_ids = inputs_ids[:, :-1]
+            target = inputs_ids[:, 1:]
+
+            output = model(input_ids)


missing seqlens?

samsja · 2024-07-31T08:21:43Z

fyi @Jackmin801 this pr was not meant to be merged for now, it was more to compare. Tho we have a 20% better mfu with it (even with torch compile) so we might want it.

I will address your comments

Jackmin801 · 2024-07-31T10:48:18Z

where are the MFU speedups coming from? if its from better layer implementations I think it might be better to just modify (copy over to our repo and modify) the HF modeling_llama.py to use the layers and maintain the more general HF PreTrainedModel API that the training script already had (the model loading semantics and loss calculation would be in the model implementation and not the training script)

Its also easier to have FP8 support this way (I have an implementation using transformer engine that uses this implementation method; https://github.com/PrimeIntellect-ai/OpenDiLoCo_internal/pull/71)

samsja requested a review from Jackmin801 July 16, 2024 16:48

samsja force-pushed the feat-llama-impl branch 2 times, most recently from f2a515e to 682d365 Compare July 17, 2024 11:54

samsja added 5 commits July 23, 2024 15:58

remove github submodule

e709312

add llama i

19814f6

add llama

fix config

fb60d7a

add custom attn impl

3c9c0e4

add 7b model

1a72a9a

samsja force-pushed the feat-llama-impl branch from b11859b to 1a72a9a Compare July 23, 2024 15:58

Jackmin801 reviewed Jul 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move to lit gpt llama impl #3

move to lit gpt llama impl #3

samsja commented Jul 16, 2024

Jackmin801 Jul 30, 2024

samsja Jul 31, 2024

Jackmin801 Jul 31, 2024

Jackmin801 Jul 30, 2024

samsja Jul 31, 2024

samsja commented Jul 31, 2024

Jackmin801 commented Jul 31, 2024 •

edited

Loading

move to lit gpt llama impl #3

Are you sure you want to change the base?

move to lit gpt llama impl #3

Conversation

samsja commented Jul 16, 2024

Jackmin801 Jul 30, 2024

Choose a reason for hiding this comment

samsja Jul 31, 2024

Choose a reason for hiding this comment

Jackmin801 Jul 31, 2024

Choose a reason for hiding this comment

Jackmin801 Jul 30, 2024

Choose a reason for hiding this comment

samsja Jul 31, 2024

Choose a reason for hiding this comment

samsja commented Jul 31, 2024

Jackmin801 commented Jul 31, 2024 • edited Loading

Jackmin801 commented Jul 31, 2024 •

edited

Loading