Skip to content

Commit

Permalink
fix: correct typos in docstrings (#482)
Browse files Browse the repository at this point in the history
- Fix 'transfomers' to 'transformers' in mixtral.py
- Fix 'Emebedding' to 'Embedding' in orpo_trainer.py

## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [ ] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: byhsu@linkedin.com <byhsu@linkedin.com>
  • Loading branch information
3 people authored Dec 17, 2024
1 parent 21baccc commit ac56674
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/liger_kernel/transformers/model/mixtral.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def lce_forward_deprecated(
cache_position: Optional[torch.LongTensor] = None,
) -> Union[Tuple, MoeCausalLMOutputWithPast]:
r"""
Copy paste Mixtral's forward from transfomers v4.44.2 but replace torch cross entropy with liger fused linear cross entropy
Copy paste Mixtral's forward from transformers v4.44.2 but replace torch cross entropy with liger fused linear cross entropy
Args:
Expand Down
2 changes: 1 addition & 1 deletion src/liger_kernel/transformers/trainer/orpo_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ class _FSDPForwardRedirection:
This is needed in cases where we call a submodule of a FSDP module. For instance, when we want to call only
the `LlamaModel` part out of a FSDP-wrapped `LlamaForCausalLM` to get the hidden states without involving
GPU-memory-heavy `lm_head` and cross entropy computation, doing this directly (i.e. `model.model.forward()`)
will not work because the first `nn.Emebedding` layer is not independently wrapped as a FSDP module (because of
will not work because the first `nn.Embedding` layer is not independently wrapped as a FSDP module (because of
the transformer-based wrapping policy), and not calling it through FSDP root module forward will not all-gather
its parameter, thus resulting in "RuntimeError: 'weight' must be 2-D" error. Similarly, if we want to call just
the `lm_head` part of a model, we need this trick too to properly get its params all-gathered.
Expand Down

0 comments on commit ac56674

Please sign in to comment.