Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SentenceTransformerTrainer compute_metrics #2888

Closed
Samoed opened this issue Aug 14, 2024 · 5 comments · Fixed by #3002
Closed

SentenceTransformerTrainer compute_metrics #2888

Samoed opened this issue Aug 14, 2024 · 5 comments · Fixed by #3002

Comments

@Samoed
Copy link
Contributor

Samoed commented Aug 14, 2024

Hi! I tried to train my model and evaluate it using compute_metrics, but I didn't get any metrics. Is this a bug, or is it not supposed to work?

Code for test:
https://colab.research.google.com/drive/11sml4nfhkVVoZy0fTsll6BLgpHYz-BWD

@ir2718
Copy link
Contributor

ir2718 commented Aug 14, 2024

Hi,

I think this is the expected behaviour. but I'm not 100% sure. The compute_metrics function usually takes in a list of predictions which you can then use to calculate the metrics. With triplet training, the difference is in the evaluation procedure, such that the evaluators take in a SentenceTransformer model used to calculate the anchor, positive, and negative embeddings, which are then used for calculating metrics.

Besides, there is no obvious way to define false positives and false negatives with triplet information. The only thing you can say when you've got triplet data is whether a positive is closer to the anchor than the negative, hence accuracy is only calculated.

@Samoed
Copy link
Contributor Author

Samoed commented Aug 14, 2024

There's no information about this expected behavior or not. Maybe printing warning if the function is passed will be good.

@jacklanda
Copy link

+1

@tomaarsen
Copy link
Collaborator

Hello!

The compute_metrics argument is inherited from the transformers Trainer superclass, where it is primarily used here: https://github.com/huggingface/transformers/blob/b54109c7466f6e680156fbd30fa929e2e222d730/src/transformers/trainer.py#L4184-L4192

This happens during every evaluation, but Sentence Transformer models don't usually train with single inputs & single outputs, so the logits and labels are both None here. To be precise, they are None because the prediction_step calls compute_loss, which is implemented in the SentenceTransformerTrainer:

if return_outputs:
# During prediction/evaluation, `compute_loss` will be called with `return_outputs=True`.
# However, Sentence Transformer losses do not return outputs, so we return an empty dictionary.
# This does not result in any problems, as the SentenceTransformerTrainingArguments sets
# `prediction_loss_only=True` which means that the output is not used.
return loss, {}
return loss

As a result, the compute_metrics function is never called over the EvalPrediction.


In Sentence Transformers, if you want to compute evaluations, it is recommended to use one of the Evaluators. There's some more information about them here.

If you want to create your own evaluator, you can subclass the SentenceEvaluator class. You can pass an evaluator to the STTrainer, or even a list if you have multiple.

  • Tom Aarsen

@Samoed
Copy link
Contributor Author

Samoed commented Oct 18, 2024

Thank you very much! Maybe add a warning about this? Because this is a bit unexpected behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants