-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Command-R-Plus, Context Window Limitations #660
Comments
🤔 not sure what would cause that. Do you have a prompt that should work in the MLX version that doesn't? Also if you are able to provide some expected output that would also be helpful.
MLX has RoPE and it should be used correctly already. |
I'm getting random Cyrillic in my responses when using tokenizer.apply_tool_use_template. Anyone else? Seems to only be when using that tool template from the tokenizer. Example output: Write 'Action:' followed by a json-formatted list of actions that you want to perform in order to produce a good response to the user's last input. You can use any of the supplied tools any number of times, but you should aim to execute the minimum number of necessary actions for the input. You should use the [
{
"tool_name": title of the tool in the specification,
"parameters": a dict of parameters to input into the tool as they are defined in the specs, or {} if it takes no parameters
}
]```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
Action: ```json
[
{
"tool некоторыми": {},
"tool_name": "internet_search"
} forniscono]
```<EOS_TOKEN> |
ignore, I was calling the tokenizer twice. fixed it in my code here for anyone who wants to test tool use (apologies in advance if there are bugs still lurking): https://github.com/fblissjr/mlx-funbox |
looks like there's still random switching to multilingual and random Cyrillic (using a simple generate + apply tool template). has anyone tested on CUDA to see if similar? |
Copying the original cohere tokenizer.json (https://huggingface.co/CohereForAI/c4ai-command-r-plus/blob/main/tokenizer.json) fixes this issue completely from my testing (output generation is slow, but so far so good!) My guess is something is happening in the mlx_lm.convert process due to the large size of the vocab + the multilingual nature of the tokenizer + the strange tokenizer.json formatting. edit: generation speed is also slightly faster now due to the correct tokenizer being used. |
That is very odd. The tokenizer copying is very simple in MLX LM. We basically load with Hugging Face and then save it with Hugging Face. There is no MLX code involved. https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/utils.py#L619 I wonder if we are somehow using the API incorrectly or maybe there is a bug in the way it's saved with Transformers. |
@fblissjr you can reproduce the behavior with: from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-plus")
tokenizer.save_pretrained(".") I feel that should not break the tokenizer.. so it might be worth filing an issue with the Cohere HF repo or the Transformer repos? Wdyt? |
@awni my guess is the latter. looks more like it's saved incorrectly (and oddly just by looking at it) in the hf repo. Haven't seen a tokenizer.json like this before. here's a quick sample of 1 page on {"version": "1.0", "truncation": null, "padding": null, "added_tokens": [{"id": 0, "content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 1, "content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 2, "content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 3, "content": "", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 4, "content": "<MASK_TOKEN>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 5, "content": "<BOS_TOKEN>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 6, "content": "<EOS_TOKEN>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 7, "content": "<EOP_TOKEN>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false, "special": true}, {"id": 255000, "special": false, "content": "<|START_OF_TURN_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255001, "special": false, "content": "<|END_OF_TURN_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255002, "special": false, "content": "<|YES_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255003, "special": false, "content": "<|NO_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255004, "special": false, "content": "<|GOOD_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255005, "special": false, "content": "<|BAD_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255006, "special": false, "content": "<|USER_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255007, "special": false, "content": "<|CHATBOT_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255008, "special": false, "content": "<|SYSTEM_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255009, "special": false, "content": "<|USER_0_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255010, "special": false, "content": "<|USER_1_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255011, "special": false, "content": "<|USER_2_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255012, "special": false, "content": "<|USER_3_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255013, "special": false, "content": "<|USER_4_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255014, "special": false, "content": "<|USER_5_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255015, "special": false, "content": "<|USER_6_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255016, "special": false, "content": "<|USER_7_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255017, "special": false, "content": "<|USER_8_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255018, "special": false, "content": "<|USER_9_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255019, "special": false, "content": "<|EXTRA_0_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255020, "special": false, "content": "<|EXTRA_1_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255021, "special": false, "content": "<|EXTRA_2_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255022, "special": false, "content": "<|EXTRA_3_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255023, "special": false, "content": "<|EXTRA_4_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255024, "special": false, "content": "<|EXTRA_5_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255025, "special": false, "content": "<|EXTRA_6_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255026, "special": false, "content": "<|EXTRA_7_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255027, "special": false, "content": "<|EXTRA_8_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}, {"id": 255028, "special": false, "content": "<|EXTRA_9_TOKEN|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": false}], "normalizer": {"type": "NFC"}, "pre_tokenizer": {"type": "Sequence", "pretokenizers": [{"type": "Digits", "individual_digits": true}, {"type": "ByteLevel", "add_prefix_space": false, "trim_offsets": true, "use_regex": true}]}, "post_processor": {"add_prefix_space": true, "trim_offsets": false, "use_regex": true, "type": "TemplateProcessing", "single": [{"SpecialToken": {"id": "<BOS_TOKEN>", "type_id": 0}}, {"Sequence": {"id": "A", "type_id": 0}}, {"SpecialToken": {"id": "<|END_OF_TURN_TOKEN|>", "type_id": 1}}, {"SpecialToken": {"id": "<EOS_TOKEN>", "type_id": 1}}], "pair": [{"SpecialToken": {"id": "<BOS_TOKEN>", "type_id": 0}}, {"Sequence": {"id": "A", "type_id": 0}}, {"Sequence": {"id": "B", "type_id": 1}}, {"SpecialToken": {"id": "<|END_OF_TURN_TOKEN|>", "type_id": 1}}, {"SpecialToken": {"id": "<EOS_TOKEN>", "type_id": 1}}], "special_tokens": {"<BOS_TOKEN>": {"id": "<BOS_TOKEN>", "ids": [5], "tokens": ["<BOS_TOKEN>"]}, "<EOS_TOKEN>": {"id": "<EOS_TOKEN>", "ids": [6], "tokens": ["<EOS_TOKEN>"]}, "<|END_OF_TURN_TOKEN|>": {"id": "<|END_OF_TURN_TOKEN|>", "ids": [255001], "tokens": ["<|END_OF_TURN_TOKEN|>"]}}}, "decoder": {"type": "ByteLevel", "add_prefix_space": true, "trim_offsets": true, "use_regex": true}, "model": {"type": "BPE", "dropout": null, "unk_token": null, "continuing_subword_prefix": null, "end_of_word_suffix": null, "fuse_unk": false, "byte_fallback": false, "vocab": {"": 0, "": 1, "": 2, "": 3, "<MASK_TOKEN>": 4, "<BOS_TOKEN>": 5, "<EOS_TOKEN>": 6, "<EOP_TOKEN>": 7, "!": 8, """: 9, "#": 10, "$": 11, "%": 12, "&": 13, "'": 14, "(": 15, ")": 16, "*": 17, "+": 18, ",": 19, "-": 20, ".": 21, "/": 22, "0": 23, "1": 24, "2": 25, "3": 26, "4": 27, "5": 28, "6": 29, "7": 30, "8": 31, "9": 32, ":": 33, ";": 34, "<": 35, "=": 36, ">": 37, "?": 38, "@": 39, "A": 40, "B": 41, "C": 42, "D": 43, "E": 44, "F": 45, "G": 46, "H": 47, "I": 48, "J": 49, "K": 50, "L": 51, "M": 52, "N": 53, "O": 54, "P": 55, "Q": 56, "R": 57, "S": 58, "T": 59, "U": 60, "V": 61, "W": 62, "X": 63, "Y": 64, "Z": 65, "[": 66, "\": 67, "]": 68, "^": 69, "_": 70, "`": 71, "a": 72, "b": 73, "c": 74, "d": 75, "e": 76, "f": 77, "g": 78, "h": 79, "i": 80, "j": 81, "k": 82, "l": 83, "m": 84, "n": 85, "o": 86, "p": 87, "q": 88, "r": 89, "s": 90, "t": 91, "u": 92, "v": 93, "w": 94, "x": 95, "y": 96, "z": 97, "{": 98, "|": 99, "}": 100, "~": 101, "\u00a1": 102, "\u00a2": 103, "\u00a3": 104, "\u00a4": 105, "\u00a5": 106, "\u00a6": 107, "\u00a7": 108, "\u00a8": 109, "\u00a9": 110, "\u00aa": 111, "\u00ab": 112, "\u00ac": 113, "\u00ae": 114, "\u00af": 115, "\u00b0": 116, "\u00b1": 117, "\u00b2": 118, "\u00b3": 119, "\u00b4": 120, "\u00b5": 121, "\u00b6": 122, "\u00b7": 123, "\u00b8": 124, "\u00b9": 125, "\u00ba": 126, "\u00bb": 127, "\u00bc": 128, "\u00bd": 129, "\u00be": 130, "\u00bf": 131, "\u00c0": 132, "\u00c1": 133, "\u00c2": 134, "\u00c3": 135, "\u00c4": 136, "\u00c5": 137, "\u00c6": 138, "\u00c7": 139, "\u00c8": 140, "\u00c9": 141, "\u00ca": 142, "\u00cb": 143, "\u00cc": 144, "\u00cd": 145, "\u00ce": 146, "\u00cf": 147, "\u00d0": 148, "\u00d1": 149, "\u00d2": 150, "\u00d3": 151, "\u00d4": 152, "\u00d5": 153, "\u00d6": 154, "\u00d7": 155, "\u00d8": 156, "\u00d9": 157, "\u00da": 158, "\u00db": 159, "\u00dc": 160, "\u00dd": 161, "\u00de": 162, "\u00df": 163, "\u00e0": 164, "\u00e1": 165, "\u00e2": 166, "\u00e3": 167, "\u00e4": 168, "\u00e5": 169, "\u00e6": 170, "\u00e7": 171, "\u00e8": 172, "\u00e9": 173, "\u00ea": 174, "\u00eb": 175, "\u00ec": 176, "\u00ed": 177, "\u00ee": 178, "\u00ef": 179, "\u00f0": 180, "\u00f1": 181, "\u00f2": 182, "\u00f3": 183, "\u00f4": 184, "\u00f5": 185, "\u00f6": 186, "\u00f7": 187, "\u00f8": 188, "\u00f9": 189, "\u00fa": 190, "\u00fb": 191, "\u00fc": 192, "\u00fd": 193, "\u00fe": 194, "\u00ff": 195, "\u0100": 196, "\u0101": 197, "\u0102": 198, "\u0103": 199, "\u0104": 200, "\u0105": 201, "\u0106": 202, "\u0107": 203, "\u0108": 204, "\u0109": 205, "\u010a": 206, "\u010b": 207, "\u010c": 208, "\u010d": 209, "\u010e": 210, "\u010f": 211, "\u0110": 212, "\u0111": 213, "\u0112": 214, "\u0113": 215, "\u0114": 216, "\u0115": 217, "\u0116": 218, "\u0117": 219, "\u0118": 220, "\u0119": 221, "\u011a": 222, "\u011b": 223, "\u011c": 224, "\u011d": 225, "\u011e": 226, "\u011f": 227, "\u0120": 228, "\u0121": 229, "\u0122": 230, "\u0123": 231, "\u0124": 232, "\u0125": 233, "\u0126": 234, "\u0127": 235, "\u0128": 236, "\u0129": 237, "\u012a": 238, "\u012b": 239, "\u012c": 240, "\u012d": 241, "\u012e": 242, "\u012f": 243, "\u0130": 244, "\u0131": 245, "\u0132": 246, "\u0133": 247, "\u0134": 248, "\u0135": 249, "\u0136": 250, "\u0137": 251, "\u0138": 252, "\u0139": 253, "\u013a": 254, "\u013b": 255, "\u013c": 256, "\u013d": 257, "\u013e": 258, "\u013f": 259, "\u0140": 260, "\u0141": 261, "\u0142": 262, "\u0143": 263, "\u200d": 264, "\u203c": 265, "\u2049": 266, "\u20e3": 267, "\u2122": 268, "\u2139": 269, "\u2194": 270, "\u2195": 271, "\u2196": 272, "\u2197": 273, "\u2198": 274, "\u2199": 275, "\u21a9": 276, "\u21aa": 277, "\u231a": 278, "\u231b": 279, "\u2328": 280, "\u23cf": 281, "\u23e9": 282, "\u23ea": 283, "\u23eb": 284, "\u23ec": 285, "\u23ed": 286, "\u23ee": 287, "\u23ef": 288, "\u23f0": 289, "\u23f1": 290, "\u23f2": 291, "\u23f3": 292, "\u23f8": 293, "\u23f9": 294, "\u23fa": 295, "\u24c2": 296, "\u25aa": 297, "\u25ab": 298, "\u25b6": 299, "\u25c0": 300, "\u25fb": 301, "\u25fc": 302, "\u25fd": 303, "\u25fe": 304, "\u2600": 305, "\u2601": 306, "\u2602": 307, "\u2603": 308, "\u2604": 309, "\u260e": 310, "\u2611": 311, "\u2614": 312, "\u2615": 313, "\u2618": 314, "\u261d": 315, "\u2620": 316, "\u2622": 317, "\u2623": 318, "\u2626": 319, "\u262a": 320, "\u262e": 321, "\u262f": 322, "\u2638": 323, "\u2639": 324, "\u263a": 325, "\u2640": 326, "\u2642": 327, "\u2648": 328, "\u2649": 329, "\u264a": 330, "\u264b": 331, "\u264c": 332, "\u264d": 333, "\u264e": 334, "\u264f": 335, "\u2650": 336, "\u2651": 337, "\u2652": 338, "\u2653": 339, "\u265f": 340, "\u2660": 341, "\u2663": 342, "\u2665": 343, "\u2666": 344, "\u2668": 345, "\u267b": 346, "\u267e": 347, "\u267f": 348, "\u2692": 349, "\u2693": 350, "\u2694": 351, "\u2695": 352, "\u2696": 353, "\u2697": 354, "\u2699": 355, "\u269b": 356, "\u269c": 357, "\u26a0": 358, "\u26a1": 359, "\u26a7": 360, "\u26aa": 361, "\u26ab": 362, "\u26b0": 363, "\u26b1": 364, "\u26bd": 365, "\u26be": 366, "\u26c4": 367, "\u26c5": 368, "\u26c8": 369, "\u26ce": 370, "\u26cf": 371, "\u26d1": 372, "\u26d3": 373, "\u26d4": 374, "\u26e9": 375, "\u26ea": 376, "\u26f0": 377, "\u26f1": 378, "\u26f2": 379, "\u26f3": 380, "\u26f4": 381, "\u26f5": 382, "\u26f7": 383, "\u26f8": 384, "\u26f9": 385, "\u26fa": 386, "\u26fd": 387, "\u2702": 388, "\u2705": 389, "\u2708": 390, "\u2709": 391, "\u270a": 392, "\u270b": 393, "\u270c": 394, "\u270d": 395, "\u270f": 396, "\u2712": 397, "\u2714": 398, "\u2716": 399, "\u271d": 400, "\u2721": 401, "\u2728": 402, "\u2733": 403, "\u2734": 404, "\u2744": 405, "\u2747": 406, "\u274c": 407, "\u274e": 408, "\u2753": 409, "\u2754": 410, "\u2755": 411, "\u2757": 412, "\u2763": 413, "\u276 |
agreed. i made a community post on hf here: https://huggingface.co/CohereForAI/c4ai-command-r-plus/discussions/15 and here: huggingface/transformers#30027 |
so this is interesting - the tokenizer.json on the bitsandbytes repo linked from the main cohere repo is a different size, and looks nothing like the original. https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit/blob/main/tokenizer.json |
another interesting difference between the 4 bit bnb tokenizer and the original - in the original one, token id token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True. |
Per comments on the hugging face repo, the differences between the two tokenizers.json files are unicode differences. I'll assume I've got something bugging on my end unless anyone else sees the same. |
This is what I have been using – I removed the texts which are just some random wikipedia page. The output is good until I try the 8500 tokens text which just outputs <PAD><PAD><PAD><PAD><PAD>... |
Have you tried with apply_tool_template by chance? Curious if you see any of the oddities I see when using it. |
Hey guys @awni, @fblissjr and @jeanromainroy, The cohere team limited the context to 8k for all Command-R variants on purpose. If you check the config file for both r-v01 and r+, the max_position_embeddings is set to 8192. It's a limit to avoid users experiencing OOM. You can read more here: |
Hey @Blaizzy, I have run the exact same test with the new llama.cpp implementation of Command-R+ and it works way above 8k tokens. |
@fblissjr Indeed the tokenizer created from the conversion is slightly smaller ~2MB than the original. I updated as you suggested. Can you check it? |
@jeanromainroy can you try again with the change in this branch, if it works I will make a PR.
Link: https://github.com/Blaizzy/mlx-examples/tree/pc/commandR |
You can also try to increase the default |
Actually did this myself yesterday with my own quant, and output was better and faster - no idea why. And now unsure if I just had a bug somewhere on my end or if it actually made a difference. I'm planning to test out a larger CUDA machine later today or tomorrow to see how it works natively. |
Let me know how it goes, but for now according to your report the issue should be fixed. |
Hey @Blaizzy , I tried your fork and the model is still outputting <PAD><PAD><PAD>... when I provide a long prompt. |
I have made a new change, can you try it again please :) |
Wait, I think I got it! Give me 30 min :) |
@jeanromainroy can you try this branch, the previous one had a git issue: https://github.com/Blaizzy/mlx-examples/tree/pc/command-R |
Still outputting <PAD><PAD><PAD>... :( |
Only PAD ? Can you share the whole output? |
It's outputting <PAD> for as long as I let it. In other words, max_tokens=256, results in 256 x <PAD> |
Got it! @awni the cohere team added Is there a way of setting using this number with the nn.Rope? Are there any deep changes needed? If so, please point them, I can work on it. |
I'm not sure I fiollow your question. The |
@jeanromainroy regarding:
My understanding is llama.cpp uses a fixed size context (the
Maybe we could provide something similar.. but I think the default behavior is a little misleading. |
@fblissjr could you share a command using |
I can't with mlx_lm.generate because it only happens when I run it with apply_tool_template in the tokenizer. Not at home right now and haven't tested since the day it happened. I think you can mock up something like this: tools = (a json object similar to the cohere example on HF) Basically you want to get the apply_tool_use_template to show up, which is a big page-ish long output that looks like this (copying and pasting from HF repo under tool use output example): <BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble System PreambleBasic RulesYou are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions. User PreambleTask and ContextYou help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging. Style GuideUnless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling. Available ToolsHere is a list of tools that you have available to you: |
@awni you can use the example in the MLX model card to replicate @fblissjr example: https://huggingface.co/mlx-community/c4ai-command-r-plus-4bit |
I see now. I thought because the PyTorch implementation takes context window size into account we were missing something. Something like this: class RotaryPositionalEmbeddings(nn.Module):
def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0):
super().__init__()
self.dim = dim
self.max_position_embeddings = max_position_embeddings
self.device=device
self.scaling_factor = scaling_factor
self.base = base
inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float().to(device) / self.dim))
self.register_buffer("inv_freq", inv_freq, persistent=False)
# Build here to make `torch.jit.trace` work.
self._set_cos_sin_cache(
seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
)
def _set_cos_sin_cache(self, seq_len, device, dtype):
self.max_seq_len_cached = seq_len
t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq)
t = t / self.scaling_factor
freqs = torch.outer(t, self.inv_freq)
# Different from paper, but it uses a different permutation in order to obtain the same calculation
emb = torch.cat((freqs, freqs), dim=-1)
self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
@torch.no_grad()
def forward(self, x, seq_len=None):
# x: [bs, num_attention_heads, seq_len, head_size]
if seq_len > self.max_seq_len_cached:
self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
return (
self.cos_cached[:seq_len].to(dtype=x.dtype),
self.sin_cached[:seq_len].to(dtype=x.dtype),
) |
Btw does that mean that we can use context size of any length with nn.RoPE? If not, what are the limitations? |
Hi folks, still no consensus on what settings to set, what json files to use? Using the mlx-community 4bit version, I have random japanese characters. What surprised me was that without even mentionning it in my prompt, at some point the model acknowledged and apoligized for their randomness and said it would try to avoid them. |
@M-I could you elaborate on what you mean? |
starting with
at some point, generate(model, tokenizer, prompt=conversation, tokenize=False, add_generation_prompt=True), verbose=True, max_tokens=1000) generated a bit where there was: P.S. Apologies for the random Japanese words (e.g., "グリーニング") that appeared in my response. It seems there might be an issue with my language model. I'll try to avoid this in future responses. 😅👍. I'm always ready to assist you with your project! 😊". And it will go on and on, as is the end_token is never generated or acknowledged. I just assumed it was the price to pay for 4bit quantization, so I never mentioned the fact that there was Japanese, or even anything weird or out of place in it's response, but it just self reflected on its own. |
I see, thank you for explaining :) I think this should be a new issue. As it's not related with this thread. |
Cohere's new Command-R-Plus model reportedly features a 128k context window. However, testing with progressively longer prompts reveals it begins producing nonsensical output (e.g., "<PAD><PAD>...") after 8192 tokens, aligning with the "max_position_embeddings" value in the config.json file. The config also lists a "rope_theta" value, suggesting its role in achieving the large context window. Is "rope" supported in MLX?
The text was updated successfully, but these errors were encountered: