Support sequence tagging evaluation metrics (NLP) #1158

pietrolesci · 2022-07-22T12:34:51Z

🚀 Feature

Support for sequence tagging evaluation metrics à la seqeval. That is, support the evaluation of the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

The text was updated successfully, but these errors were encountered:

SkafteNicki · 2022-07-22T13:13:09Z

cc @stancld opinion on this?

stancld · 2022-07-23T15:19:14Z

I'm not so familiar with this kind of metrics.. How much do these metrics differ from standard classification ones? :] @pietrolesci

pietrolesci · 2022-07-23T15:38:40Z

Hi @stancld,

I think it's not much different. The convenience of having sequence-level metrics already available is that

they can be fed sequences directly (without manual iteration)
can implement different evaluation "policies": "strict" vs non strict. For example

pred: [A, A, B]
true: [A, B, B]

can be considered partially correct or incorrect. This, of course, has an effect on how results are aggregated. An practical example in the README.md.

it can be easier to enforce particular encodings for the NER or POS tags (for example)
last but not least, it would be nice to have it in torchmetrics for consistency (i.e., no need to resort to other libraries/frameworks)

stancld · 2022-07-27T20:46:37Z

Hi @pietrolesci, I get the motivation and think this might be a nice contribution to torchmetrics. 👍

As these metrics will be very likely inherited from the classification ones, I'd just wait a bit with this addition for the finalization of the classification refactor currently ongoing #1001 :]

stancld · 2022-10-05T10:18:40Z

Hi @pietrolesci -- I think I should be able to find some time in the near future to have a look at this class of metrics. However, I'm not fully familiar with the current state of tagging metrics. Do you think it will make more sense if our public API will accept something like Sequence[Sequence[str], or it's better to use torch.Tensor here? (I think transformers models tend to output tensors, so it would make sense as well). Also, we can support both options and make sure everything is converted to tensors internally (considering this won't be too much confusing at our public api). What do you think? :]
cc: @Borda @SkafteNicki

Borda · 2022-10-19T12:23:20Z

I think it would be good to explore this direction; also we can set a quick call with @pietrolesci to get more context, and maybe he could give us some intro... 🐰

pietrolesci added the enhancement New feature or request label Jul 22, 2022

SkafteNicki added the New metric label Jul 22, 2022

stale bot added the wontfix label Sep 28, 2022

stale bot removed the wontfix label Oct 5, 2022

Lightning-AI deleted a comment from stale bot Oct 19, 2022

stancld added this to the future milestone Oct 28, 2022

stancld added the waiting on author label Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sequence tagging evaluation metrics (NLP) #1158

Support sequence tagging evaluation metrics (NLP) #1158

pietrolesci commented Jul 22, 2022

SkafteNicki commented Jul 22, 2022

stancld commented Jul 23, 2022

pietrolesci commented Jul 23, 2022

stancld commented Jul 27, 2022

stancld commented Oct 5, 2022

Borda commented Oct 19, 2022 •

edited

Loading

Support sequence tagging evaluation metrics (NLP) #1158

Support sequence tagging evaluation metrics (NLP) #1158

Comments

pietrolesci commented Jul 22, 2022

🚀 Feature

SkafteNicki commented Jul 22, 2022

stancld commented Jul 23, 2022

pietrolesci commented Jul 23, 2022

stancld commented Jul 27, 2022

stancld commented Oct 5, 2022

Borda commented Oct 19, 2022 • edited Loading

Borda commented Oct 19, 2022 •

edited

Loading