Closed
Description
Feature request
I have a tokenizer which have lots of preserved tokens like bellow:
'<reserved_7>': 100,
'<reserved_8>': 101,
'<reserved_9>': 102,
'<reserved_10>': 103,
'<reserved_11>': 104,
'<reserved_12>': 105,
'<reserved_13>': 106,
'<reserved_14>': 107,
I want to replace the '<reserved_7>' with '<|im_start|>' and replace '<reserved_8>' with '<|im_end|>'
what I want to get is a tokenizer which can act as below:
tokenizer.encode('<|im_start|>') => 100
Motivation
I want to replace the '<reserved_7>' with '<|im_start|>' and replace '<reserved_8>' with '<|im_end|>'
Your contribution
no
Metadata
Metadata
Assignees
Labels
No labels