Evaluation Part 2 #136

pintersebastian · 2023-06-10T08:00:40Z

pintersebastian
Jun 10, 2023

Hi!

Should we evaluate the baseline qrels from part 1 with compute_metrics_plain() as well? I am asking as the data differs from a re-ranking task (which is the point I guess?) and we get an MRR@10 of 0.95 with the re-ranking metrics.

Or should we come up with our own evaluation metrics?

If we just do the evaluation with the core_metrics_plain() there are several issues I think. Mainly that documents, that have the same relevance rank according to the baseline (or our own aggregation), e.g. 3 are sorted in arbitrary order. Lets say we have 3 documents per query with a ranks 3,3,2. Than, switching the position of the first 2 documents would result in a pretty different metric, regarding MRR for example

Kind Regards,
Sebastian

Answered by pi-pa

Jun 13, 2023

Dear Sebastian,

Using the core_metrics_plain() is enough. Yes, the arbitrary sorting will result in worse scores but for our purposes, this is good enough.

Best,

Pia

View full answer

pi-pa · 2023-06-13T11:27:49Z

pi-pa
Jun 13, 2023

Dear Sebastian,

Using the core_metrics_plain() is enough. Yes, the arbitrary sorting will result in worse scores but for our purposes, this is good enough.

Best,

Pia

0 replies

pintersebastian · 2023-06-13T15:35:35Z

pintersebastian
Jun 13, 2023
Author

Alright, thanks a lot for the reply!

Kind Regards,
Sebastian

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Part 2 #136

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Evaluation Part 2 #136

pintersebastian Jun 10, 2023

Replies: 2 comments

pi-pa Jun 13, 2023

pintersebastian Jun 13, 2023 Author

pintersebastian
Jun 10, 2023

pi-pa
Jun 13, 2023

pintersebastian
Jun 13, 2023
Author