-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T5ForSequenceClassification
#14097
Comments
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This seems like a useful addition, especially considering the EncT5 paper |
any update on this? |
Maybe of interest to @NielsRogge |
Token Classification would also be very interesting when I think of evaluations for Big Science project. |
But w.r.t. sequence classification, shouldn't it be similar to the sequence classification that is used for the BART model, as seen here: transformers/src/transformers/models/bart/modeling_bart.py Lines 1415 to 1530 in 7799b61
🤔 |
Hi. I have done this at subhalingamd@82db59d. But the list of uninitialized weights doesn't seem convincing at all. Here is the list for reference (for
In case of
Am I missing out on something, or is it just that something extra has to be done to import the model? 🤔 |
@stefan-it / @subhalingamd the code for My solution is here, happy to push it, though it's a lot of duplicate code. Should some refactoring be performed between this and Bart? The other addition from the EncT5 paper is that the encoder outputs are pooled to simulate the text-to-text nature of classification using NLG. This is not in my implementation but can be added. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@LysandreJik |
Hi, Here is the repository with the code and results: https://github.com/osainz59/t5-encoder Is there still an interest on implementing this into Transformers library? |
Hi @osainz59 , your repo looks really interesting, did you also perform experiments on NER? I've recently added support for encoder-only fine-tuning into Flair library, and results are really promising for this kind of downstream task. However, both sequence and token classification would be awesome to have it in Transformers 🤗 |
Hi @stefan-it , I did not test the |
Hi @osainz59 I think one really interesting dataset would be the CoNLL-2003 (see https://huggingface.co/datasets/conll2003). When testing the mT5 model series, the WikiANN (Rahimi splits from here: https://huggingface.co/datasets/wikiann) is also very interesting (train on English split only and test it on the other languages for comparisons with the mT5 paper) :) |
Hi @stefan-it , I trained and evaluated the
I think they are still a bit behind of RoBERTa, but at those levels of F1 is hard to decide. Nevertheless, I think these results suggests that T5-Enc could be an interesting addition to the Transformers library. |
Hey @osainz59 thanks for reporting back! I would love to see this in Transformers directly! |
This is exactly what I am looking for! |
I've opened a PR #24726 for Although based on the results shown in this thread it seems like we could also look into adding a version that only uses the Encoder as well. |
FYI - Just created a pull request #26683 for EncT5, which is similar to T5ForSequenceClassification but with focuses on the encoding layers only (but still has a single decoder layer). It is based on results of https://arxiv.org/pdf/2110.08426.pdf. |
Thanks @sjrl! I wanted to circle back with more people on this thread to see if anyone else in the community is still interested in an encoder-variant EncT5 for the T5ForSequenceClassification model, as proposed in this paper. I have a draft out in #26683, but since the paper is ~2 years old now, @ArthurZucker suggested we see if there is still interest in the community before moving forward. @sunyuhan19981208 @stefan-it @osainz59 @subhalingamd @prajjwal1 - let us know if you think the encoder-variant EncT5 is worth adding to the library. Thanks! |
Hi @hackyon ! My implementation of encoer-only T5 for sequence and token classification still has some activity (clones and visitis). It is not much, but people definitely uses it. |
@hackyon As another data point, in my experiments I found the EncT5 approach to work well for efficiently scaling models with encoders to >1B parameters and the single decoder layer worked much better than naively pooling the encoder outputs I suspect most interest in scaling has shifted to decoder-only models, but if this can be done without a ton of "model sprawl" as you mention in your PR, I think it could be a nice addition |
Thanks @osainz59 and @dwyatte for your inputs! I'll follow up with @ArthurZucker again to see if we can move forward. PS. @dwyatte - could be worth checking out this comment from Frederick. Seems like there's more research/evidence that supports the single decoder layer approach (at least for multi-label classification in this case). |
can i work on this ? could you assign this to me |
Hello @mahita2104. I believe this issue has more or less been resolved already by @sjrl in #24726. You should be able to find T5ForSequenceClassification in the latest code base. I think they can probably mark this particular issue as closed if possible. I am working on an extension to this in #26683, but am still iterating on it. |
Yes closing this one thanks @hackyon 🤗 |
🚀 Feature request
T5 to classify sequences by using only the encoder of T5 and a
ClassificationHead
.Motivation
This gives the benefits of fine-tuning a model with no maximum sequence length (useful for long sequence tasks) without having to load the decoder weights into memory/treat it as a generative task.
Your contribution
I already have working code for this, and saw some requests for it in other forums (slack, torch, huggingface) so if it's a welcome addition I'd be happy to add it to the library.
The text was updated successfully, but these errors were encountered: