Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Cohere Model #30027

Merged
merged 3 commits into from
Apr 4, 2024
Merged

Conversation

saurabhdash2512
Copy link
Contributor

@saurabhdash2512 saurabhdash2512 commented Apr 3, 2024

What does this PR do?

Refactor Cohere Model

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker and @younesbelkada

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@saurabhdash2512 saurabhdash2512 changed the title Updates Refactor Cohere Model Apr 3, 2024
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
Let's keep BC for the layer norm name.
While we are at it, missing copied from! 🤗

@@ -76,10 +76,9 @@ def _get_unpad_data(attention_mask):


class CohereLayerNorm(nn.Module):
def __init__(self, hidden_size, eps=1e-5, bias=False):
def __init__(self, param_shape=None, eps=1e-5, bias=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, param_shape=None, eps=1e-5, bias=False):
def __init__(self, hidden_size =None, eps=1e-5, bias=False):
""" The hidden size can be a tuple or an int. If a tuple is used, the layer will be applied on both dim """

Comment on lines +153 to +160
dtype = q.dtype
q = q.float()
k = k.float()
cos = cos.unsqueeze(unsqueeze_dim)
sin = sin.unsqueeze(unsqueeze_dim)
q_embed = (q * cos) + (rotate_half(q) * sin)
k_embed = (k * cos) + (rotate_half(k) * sin)
return q_embed, k_embed
return q_embed.to(dtype=dtype), k_embed.to(dtype=dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is done outside apply_rotary_pos_emb we can keep the copied from 😉 but it's not an issue


if (self.head_dim * self.num_heads) != self.hidden_size:
raise ValueError(
f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
f" and `num_heads`: {self.num_heads})."
)

if self.use_qk_norm:
# When sharding the model using Tensor Parallelism, need to be careful to use n_local_heads
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this comment is relevant as the model is not sharded by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is to warn others who port this to other frameworks from HF.

@ArthurZucker
Copy link
Collaborator

As a followup, adding integration tests for the new model will be nice.

@ArthurZucker ArthurZucker merged commit 517a3e6 into huggingface:main Apr 4, 2024
17 checks passed
@fblissjr
Copy link

fblissjr commented Apr 8, 2024

I think there's an issue with the tokenizer.json (or the way transformers is converting). Random multilingual out of nowhere.

See: https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions/15

Found this after converting to mlx-lm, but was found to be reproducible via the following:

You can reproduce with:

 
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-plus")
tokenizer.save_pretrained(".")```

@awni
Copy link

awni commented Apr 8, 2024

Concretely, loading -> saving -> loading the tokenizer all through HF AutoTokenizer produces a new tokenizer than the initial loaded one with worse behavior. Presumably that should not be the case.

@fblissjr
Copy link

fblissjr commented Apr 8, 2024

worth noting - the bitsandbytes 4 bit repo linked from the main cohere repo is also a different size and looks very different from the original model repo's tokenizer.json (https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit/blob/main/tokenizer.json).

@fblissjr
Copy link

fblissjr commented Apr 8, 2024

another interesting difference between the 4 bit bnb tokenizer and the original - in the original one, token id token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True.

@ArthurZucker
Copy link
Collaborator

I have no idea what you are talking about? A 4bit tokenizer? could you actually open an issue with a repro and the issue?

@ahmetustun
Copy link
Contributor

ahmetustun commented Apr 8, 2024

I think the difference between two tokenizers.json is unicode encoding after tokenizer.save_pretrained which is used to save tokenizer in 4bit model. Also, in the config <|END_OF_TURN_TOKEN|> token is also set as special because it is used as eos_token, which is also overwritten in the original tokenizer (command-r-plus) as well. Therefore, two tokenizers should work the same. @fblissjr it would be helpful if you post the text you tokenize to double-check and reproduce.

@fblissjr
Copy link

fblissjr commented Apr 8, 2024

I have no idea what you are talking about? A 4bit tokenizer? could you actually open an issue with a repro and the issue?

What @ahmetustun mentioned is the difference - not a 4 bit tokenizer, the tokenizer.json in https://huggingface.co/CohereForAI/c4ai-command-r-plus vs. https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit, which @ahmetustun clarified.

Not at my workstation now, but if nobody else is seeing this issue, I'll assume I've got something wrong on my end. Thanks for the clarification.

@ahmetustun
Copy link
Contributor

Please left further comment in the model repo if the problem continues. Thanks @fblissjr.

ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024
* changes

* addressing comments

* smol fix
itazap pushed a commit that referenced this pull request May 14, 2024
* changes

* addressing comments

* smol fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants