Skip to content

Commit

Permalink
common : add comments
Browse files Browse the repository at this point in the history
  • Loading branch information
ggerganov committed Aug 26, 2023
1 parent 9668aa1 commit 8976384
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 1 deletion.
6 changes: 6 additions & 0 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -116,15 +116,21 @@ struct llama_context_params llama_context_params_from_gpt_params(const gpt_param
// Vocab utils
//

// tokenizes a string into a vector of tokens
// should work similar to Python's `tokenizer.decode`
std::vector<llama_token> llama_tokenize(
struct llama_context * ctx,
const std::string & text,
bool add_bos);

// tokenizes a token into a piece
// should work similar to Python's `tokenizer.id_to_piece`
std::string llama_token_to_piece(
const struct llama_context * ctx,
llama_token token);

// detokenizes a vector of tokens into a string
// should work similar to Python's `tokenizer.decode`
// removes the leading space from the first non-BOS token
std::string llama_detokenize(
llama_context * ctx,
Expand Down
2 changes: 1 addition & 1 deletion llama.h
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ extern "C" {
// Token Id -> Piece.
// Uses the vocabulary in the provided context.
// Does not write null terminator to the buffer.
// Use code is responsible to remove the leading whitespace of the first non-BOS token.
// User code is responsible to remove the leading whitespace of the first non-BOS token when decoding multiple tokens.
LLAMA_API int llama_token_to_piece(
const struct llama_context * ctx,
llama_token token,
Expand Down

0 comments on commit 8976384

Please sign in to comment.