Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sequence tagging evaluation metrics (NLP) #1158

Open
pietrolesci opened this issue Jul 22, 2022 · 6 comments
Open

Support sequence tagging evaluation metrics (NLP) #1158

pietrolesci opened this issue Jul 22, 2022 · 6 comments
Milestone

Comments

@pietrolesci
Copy link

🚀 Feature

Support for sequence tagging evaluation metrics à la seqeval. That is, support the evaluation of the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

@pietrolesci pietrolesci added the enhancement New feature or request label Jul 22, 2022
@SkafteNicki
Copy link
Member

cc @stancld opinion on this?

@stancld
Copy link
Contributor

stancld commented Jul 23, 2022

I'm not so familiar with this kind of metrics.. How much do these metrics differ from standard classification ones? :] @pietrolesci

@pietrolesci
Copy link
Author

Hi @stancld,

I think it's not much different. The convenience of having sequence-level metrics already available is that

  • they can be fed sequences directly (without manual iteration)
  • can implement different evaluation "policies": "strict" vs non strict. For example
pred: [A, A, B]
true: [A, B, B]

can be considered partially correct or incorrect. This, of course, has an effect on how results are aggregated. An practical example in the README.md.

  • it can be easier to enforce particular encodings for the NER or POS tags (for example)
  • last but not least, it would be nice to have it in torchmetrics for consistency (i.e., no need to resort to other libraries/frameworks)

@stancld
Copy link
Contributor

stancld commented Jul 27, 2022

Hi @pietrolesci, I get the motivation and think this might be a nice contribution to torchmetrics. 👍

As these metrics will be very likely inherited from the classification ones, I'd just wait a bit with this addition for the finalization of the classification refactor currently ongoing #1001 :]

@stale stale bot added the wontfix label Sep 28, 2022
@stancld
Copy link
Contributor

stancld commented Oct 5, 2022

Hi @pietrolesci -- I think I should be able to find some time in the near future to have a look at this class of metrics. However, I'm not fully familiar with the current state of tagging metrics. Do you think it will make more sense if our public API will accept something like Sequence[Sequence[str], or it's better to use torch.Tensor here? (I think transformers models tend to output tensors, so it would make sense as well). Also, we can support both options and make sure everything is converted to tensors internally (considering this won't be too much confusing at our public api). What do you think? :]
cc: @Borda @SkafteNicki

@stale stale bot removed the wontfix label Oct 5, 2022
@Lightning-AI Lightning-AI deleted a comment from stale bot Oct 19, 2022
@Borda
Copy link
Member

Borda commented Oct 19, 2022

I think it would be good to explore this direction; also we can set a quick call with @pietrolesci to get more context, and maybe he could give us some intro... 🐰

@stancld stancld added this to the future milestone Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
@Borda @SkafteNicki @stancld @pietrolesci and others