llama : add RobertaForSequenceClassification reranker support #13875

CISC · 2025-05-28T21:42:59Z

Adds support for RoBERTa reranker.

Added variable cls_out shape support to be able to load model (and yes, this will likely need improvements to work as intended with more than 1 label, I'll leave that for a free-for-all follow up PR).

Also fixes JinaBert conversion error from #13858

cc/ @huydt84

ggerganov

this will likely need improvements to work as intended with more than 1 label

Could you clarify what would be needed for multiple labels?

CISC · 2025-05-29T06:13:57Z

Could you clarify what would be needed for multiple labels?

I haven't looked too deeply into it, but it has relevance for the loss function at least:
https://github.com/huggingface/transformers/blob/66da7001457a44c2a04cd329b445c574b9561e8b/src/transformers/models/roberta/modeling_roberta.py#L1221-L1239

ggerganov · 2025-05-29T06:21:31Z

I think the user code should also somehow query the number of labels, correct?

ggerganov · 2025-05-29T06:24:31Z

Btw, you can trigger ggml-ci on your PRs by adding the "ggml-ci" string to your commit message (see https://github.com/ggml-org/llama.cpp/tree/master/ci)

CISC · 2025-05-29T06:25:04Z

Btw, you can trigger ggml-ci on your PRs by adding the "ggml-ci" string to your commit message (see https://github.com/ggml-org/llama.cpp/tree/master/ci)

Ahhh, that's useful!

CISC · 2025-05-29T06:26:58Z

I think the user code should also somehow query the number of labels, correct?

Probably, which equals the length of the label array in metadata.

CISC · 2025-05-29T06:31:13Z

@ggerganov Funny, test still fails, but I can't find anything in the logs?

ggerganov · 2025-05-29T06:33:58Z

Did you check the stdall output - right at the bottom:

https://github.com/ggml-org/ci/blob/results/llama.cpp/63/85b843a8dc8e15b8362196039720c58dd79fa2/ggml-2-x86-cpu/stdall

CISC · 2025-05-29T06:36:08Z

Did you check the stdall output - right at the bottom:

Ah, that's weird, still the same error...

CISC · 2025-05-29T06:50:29Z

Oh, great, AutoConfig fills in dummy (and incorrect) data, sigh.

huydt84 · 2025-05-29T07:02:59Z

The config.json doesn't reflect AutoConfig.from_pretrained(..). It automatically added id2label https://github.com/huggingface/transformers/blob/8010f3cf616c98964ec38dde78de54b25f2a64af/src/transformers/configuration_utils.py#L364

CISC · 2025-05-29T07:31:23Z

The config.json doesn't reflect AutoConfig.from_pretrained(..). It automatically added id2label

Yep, figured out a workaround, making a PR.

CISC · 2025-05-30T09:12:33Z

Could you clarify what would be needed for multiple labels?

I haven't looked too deeply into it, but it has relevance for the loss function at least

@ggerganov I looked into it a little more, and as far as I can tell all that is needed is to extract 1 score per label per sequence. I'm not sure how to do that though, I can see that 1 score is extracted from the beginning of each sequence in llama_context.cpp, but I have no idea where the additional scores would be, nor if it requires some more work to generate them...

ggerganov · 2025-05-30T10:10:16Z

We have to introduce some notion about the number of classification labels. When we have this notion, then we just need to adjust the constant 1 here to use it:

llama.cpp/src/llama-context.cpp

Lines 814 to 821 in 291f2b6

    
           for (uint32_t s = 0; s < ubatch.n_seqs; ++s) { 
        
               const llama_seq_id seq_id = ubatch.seq_id[s][0]; 
        
               if (embd_seq_out.find(seq_id) != embd_seq_out.end()) { 
        
                   continue; 
        
               } 
        
               embd_seq_out[seq_id].resize(1); 
        
               ggml_backend_tensor_get_async(backend_embd, t_embd, embd_seq_out[seq_id].data(), (seq_id)*sizeof(float), sizeof(float)); 
        
           }

And update the comment in the public header:

llama.cpp/include/llama.h

Lines 913 to 917 in 291f2b6

    
           // Get the embeddings for a sequence id 
        
           // Returns NULL if pooling_type is LLAMA_POOLING_TYPE_NONE 
        
           // when pooling_type == LLAMA_POOLING_TYPE_RANK, returns float[1] with the rank of the sequence 
        
           // otherwise: float[n_embd] (1-dimensional) 
        
           LLAMA_API float * llama_get_embeddings_seq(struct llama_context * ctx, llama_seq_id seq_id);

For example, we can add n_cls to the hparams and expose it with an API llama_n_cls(const struct llama_model * model). Not sure if this is the best way - I haven't used such classifiers, so I am not familiar with the use cases.

CISC · 2025-05-30T10:22:21Z

I already added n_cls_out to hparams that does this. :)

ggerganov · 2025-05-30T10:27:44Z

Yes, I forgot we already have it. So it should be simple to add this functionality. Maybe as simple as:

     // extract `n_cls_out` scores:
     embd_seq_out[seq_id].resize(n_cls_out); 
     ggml_backend_tensor_get_async(backend_embd, t_embd, embd_seq_out[seq_id].data(), (seq_id*n_cls_out)*sizeof(float), sizeof(float)*n_cls_out);

CISC · 2025-05-30T10:32:34Z

Yes, I forgot we already have it. So it should be simple to add this functionality. Maybe as simple as:

Heh, that worked (I could have sworn I already tried this but only got 0)! I'll make a PR... :)

CISC · 2025-05-30T20:53:34Z

@ggerganov I got it working, but I need a place to store the labels so I can provide them to the user.

Unfortunately model->gguf_kv can't store a vector of strings, and neither can hparams, I tried with a array of string_view, which kind of works, but I think gguf_context (ml.meta) gets nuked (and so the strings with it), so I'm a bit at a loss.

Edit: I found a workaround. :)

CISC added 4 commits May 28, 2025 21:05

add support for roberta ranking models

b3e188b

add Classifier.OUTPUT_LABELS

e9dc001

add output labels to gguf

3f84682

handle cls_out shape based on output labels

2160aef

CISC requested a review from ggerganov May 28, 2025 21:43

github-actions bot added the python python script changes label May 28, 2025

rename to n_cls_out

12512e1

ggerganov approved these changes May 29, 2025

View reviewed changes

CISC merged commit 6385b84 into master May 29, 2025
49 checks passed

CISC deleted the cisc/roberta-ranker-support branch May 29, 2025 06:15

llama : add RobertaForSequenceClassification reranker support #13875

llama : add RobertaForSequenceClassification reranker support #13875

Uh oh!

Conversation

CISC commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

CISC commented May 29, 2025

Uh oh!

Uh oh!

ggerganov commented May 29, 2025

Uh oh!

ggerganov commented May 29, 2025

Uh oh!

CISC commented May 29, 2025

Uh oh!

CISC commented May 29, 2025

Uh oh!

CISC commented May 29, 2025

Uh oh!

ggerganov commented May 29, 2025

Uh oh!

CISC commented May 29, 2025

Uh oh!

CISC commented May 29, 2025

Uh oh!

huydt84 commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented May 29, 2025

Uh oh!

CISC commented May 30, 2025

Uh oh!

ggerganov commented May 30, 2025

Uh oh!

CISC commented May 30, 2025

Uh oh!

ggerganov commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented May 30, 2025

Uh oh!

CISC commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

CISC commented May 28, 2025 •

edited

Loading

huydt84 commented May 29, 2025 •

edited

Loading

ggerganov commented May 30, 2025 •

edited

Loading

CISC commented May 30, 2025 •

edited

Loading