Skip to content

llama : add RobertaForSequenceClassification reranker support #13875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 29, 2025

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented May 28, 2025

Adds support for RoBERTa reranker.

Added variable cls_out shape support to be able to load model (and yes, this will likely need improvements to work as intended with more than 1 label, I'll leave that for a free-for-all follow up PR).

Also fixes JinaBert conversion error from #13858

cc/ @huydt84

@CISC CISC requested a review from ggerganov May 28, 2025 21:43
@github-actions github-actions bot added the python python script changes label May 28, 2025
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will likely need improvements to work as intended with more than 1 label

Could you clarify what would be needed for multiple labels?

@CISC
Copy link
Collaborator Author

CISC commented May 29, 2025

Could you clarify what would be needed for multiple labels?

I haven't looked too deeply into it, but it has relevance for the loss function at least:
https://github.com/huggingface/transformers/blob/66da7001457a44c2a04cd329b445c574b9561e8b/src/transformers/models/roberta/modeling_roberta.py#L1221-L1239

@CISC CISC merged commit 6385b84 into master May 29, 2025
49 checks passed
@CISC CISC deleted the cisc/roberta-ranker-support branch May 29, 2025 06:15
@ggerganov
Copy link
Member

I think the user code should also somehow query the number of labels, correct?

@ggerganov
Copy link
Member

Btw, you can trigger ggml-ci on your PRs by adding the "ggml-ci" string to your commit message (see https://github.com/ggml-org/llama.cpp/tree/master/ci)

@CISC
Copy link
Collaborator Author

CISC commented May 29, 2025

Btw, you can trigger ggml-ci on your PRs by adding the "ggml-ci" string to your commit message (see https://github.com/ggml-org/llama.cpp/tree/master/ci)

Ahhh, that's useful!

@CISC
Copy link
Collaborator Author

CISC commented May 29, 2025

I think the user code should also somehow query the number of labels, correct?

Probably, which equals the length of the label array in metadata.

@CISC
Copy link
Collaborator Author

CISC commented May 29, 2025

@ggerganov Funny, test still fails, but I can't find anything in the logs?

@ggerganov
Copy link
Member

@CISC
Copy link
Collaborator Author

CISC commented May 29, 2025

Did you check the stdall output - right at the bottom:

Ah, that's weird, still the same error...

@CISC
Copy link
Collaborator Author

CISC commented May 29, 2025

Oh, great, AutoConfig fills in dummy (and incorrect) data, sigh.

@huydt84
Copy link
Contributor

huydt84 commented May 29, 2025

The config.json doesn't reflect AutoConfig.from_pretrained(..). It automatically added id2label https://github.com/huggingface/transformers/blob/8010f3cf616c98964ec38dde78de54b25f2a64af/src/transformers/configuration_utils.py#L364

@CISC
Copy link
Collaborator Author

CISC commented May 29, 2025

The config.json doesn't reflect AutoConfig.from_pretrained(..). It automatically added id2label

Yep, figured out a workaround, making a PR.

@CISC
Copy link
Collaborator Author

CISC commented May 30, 2025

Could you clarify what would be needed for multiple labels?

I haven't looked too deeply into it, but it has relevance for the loss function at least

@ggerganov I looked into it a little more, and as far as I can tell all that is needed is to extract 1 score per label per sequence. I'm not sure how to do that though, I can see that 1 score is extracted from the beginning of each sequence in llama_context.cpp, but I have no idea where the additional scores would be, nor if it requires some more work to generate them...

@ggerganov
Copy link
Member

We have to introduce some notion about the number of classification labels. When we have this notion, then we just need to adjust the constant 1 here to use it:

for (uint32_t s = 0; s < ubatch.n_seqs; ++s) {
const llama_seq_id seq_id = ubatch.seq_id[s][0];
if (embd_seq_out.find(seq_id) != embd_seq_out.end()) {
continue;
}
embd_seq_out[seq_id].resize(1);
ggml_backend_tensor_get_async(backend_embd, t_embd, embd_seq_out[seq_id].data(), (seq_id)*sizeof(float), sizeof(float));
}

And update the comment in the public header:

llama.cpp/include/llama.h

Lines 913 to 917 in 291f2b6

// Get the embeddings for a sequence id
// Returns NULL if pooling_type is LLAMA_POOLING_TYPE_NONE
// when pooling_type == LLAMA_POOLING_TYPE_RANK, returns float[1] with the rank of the sequence
// otherwise: float[n_embd] (1-dimensional)
LLAMA_API float * llama_get_embeddings_seq(struct llama_context * ctx, llama_seq_id seq_id);

For example, we can add n_cls to the hparams and expose it with an API llama_n_cls(const struct llama_model * model). Not sure if this is the best way - I haven't used such classifiers, so I am not familiar with the use cases.

@CISC
Copy link
Collaborator Author

CISC commented May 30, 2025

I already added n_cls_out to hparams that does this. :)

@ggerganov
Copy link
Member

ggerganov commented May 30, 2025

Yes, I forgot we already have it. So it should be simple to add this functionality. Maybe as simple as:

     // extract `n_cls_out` scores:
     embd_seq_out[seq_id].resize(n_cls_out); 
     ggml_backend_tensor_get_async(backend_embd, t_embd, embd_seq_out[seq_id].data(), (seq_id*n_cls_out)*sizeof(float), sizeof(float)*n_cls_out);

@CISC
Copy link
Collaborator Author

CISC commented May 30, 2025

Yes, I forgot we already have it. So it should be simple to add this functionality. Maybe as simple as:

Heh, that worked (I could have sworn I already tried this but only got 0)! I'll make a PR... :)

@CISC
Copy link
Collaborator Author

CISC commented May 30, 2025

@ggerganov I got it working, but I need a place to store the labels so I can provide them to the user.

Unfortunately model->gguf_kv can't store a vector of strings, and neither can hparams, I tried with a array of string_view, which kind of works, but I think gguf_context (ml.meta) gets nuked (and so the strings with it), so I'm a bit at a loss.

Edit: I found a workaround. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants