Skip to content

Latest commit

 

History

History
1428 lines (1426 loc) · 459 KB

SentencePieceModels.md

File metadata and controls

1428 lines (1426 loc) · 459 KB

Tatoeba-MT Sentence Piece Models

import sentencepiece as spm
sp = spm.SentencePieceProcessor(model_file='opusTC.eng.16k.spm')
print(sp.encode(['Hello world', 'This is a tokenization-test'], out_type=str))