手动替换词表 #861
strokesegment
started this conversation in
General
手动替换词表
#861
Replies: 2 comments 1 reply
-
可以仿照 https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/merge_tokenizer/merge_tokenizers.py 中的方式进行手动修改词表 |
Beta Was this translation helpful? Give feedback.
1 reply
-
感谢您的回复,我参考您的代码,进行了词表的替换,但是仍然有个问题。比如原始词表中有“U”, “L”, “ONG”。我在旧词表中将低频的某个词替换成“ULONG”,并且修改了他的得分。但是使用新词表时ULONG仍然被分解为“U”, “L”, “ONG” |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
请问我能够手动替换llama2词表中的低频词,不改变词表的大小进行训练呢?
Beta Was this translation helpful? Give feedback.
All reactions