not good when I use BERT for seq2seq model in keyphrase generation #59

whqwill · 2018-11-28T08:44:24Z

Hi,

recently, I am researching about Keyphrase generation. Usually, people use seq2seq with attention model to deal with such problem. Specifically I use the framework: https://github.com/memray/seq2seq-keyphrase-pytorch, which is implementation of http://memray.me/uploads/acl17-keyphrase-generation.pdf .

Now I just change its encoder part to BERT, but the result is not good. The experiment comparison of two models is in the attachment.

Can you give me some advice if what I did is reasonable and if BERT is suitable for doing such a thing?

Thanks.
RNN vs BERT in Keyphrase generation.pdf

waynedane · 2018-11-28T09:27:45Z

have u tried transformer decoder ?instead of rnn decoder.

whqwill · 2018-11-28T09:33:34Z

not yet, I will try. But I think rnn decoder should not be such bad.

waynedane · 2018-11-28T09:43:01Z

not yet, I will try. But I think rnn decoder should not be such bad.

emmm，maybe u should used mean of last layer to initialize decoder, not the last token representation of last layer.
I am also very concerned about the results of using transformer decoder. If you are done, can you tell me? Thank you.

waynedane · 2018-11-28T09:55:02Z

I think the batch size of RNN with BERT is too small. pleas see

https://github.com/memray/seq2seq-keyphrase-pytorch/blob/master/pykp/dataloader.py
line 377-378

whqwill · 2018-11-28T10:15:42Z

I don't know what you mean by giving me this link. I set to 10 really because of the memory problem. Actually, when sentence length is 512, the max batch size is only 5, if it is 6 or bigger there will be memory error for my GPU.

whqwill · 2018-11-28T10:16:57Z

not yet, I will try. But I think rnn decoder should not be such bad.

emmm，maybe u should used mean of last layer to initialize decoder, not the last token representation of last layer.
I am also very concerned about the results of using transformer decoder. If you are done, can you tell me? Thank you.

You are right. Maybe the mean is better, I will try as well. Thanks.

waynedane · 2018-11-28T10:52:16Z

May i ask a question? R u chinese?23333

waynedane · 2018-11-28T10:57:07Z

Cause for one example, it has N targets. We wanna put all targets in the same batch. 10 is too small that the targets of one example would be in different batches probably.

whqwill · 2018-11-28T11:37:01Z

I know, but ... the same problem ... my memory is limited .. so ...

PS. I am Chinese

waynedane · 2018-11-28T12:58:25Z

I know, but ... the same problem ... my memory is limited .. so ...

PS. I am Chinese

i am as well hahaha

waynedane · 2018-11-28T15:14:17Z

是不是语料的问题，bert是在wiki上训练的。我用kp20k训练了一个mini bert，在测试集上的accuracy目前是80%，你要不要试试用我这个作为encoder？

whqwill · 2018-11-29T01:38:11Z

这个80%具体是什么数值这么高？f1 score吗？你的encoder能不能发来看一下呢谢 waynedane <notifications@github.com>于2018年11月28日周三下午11:14写道：

…

是不是语料的问题，bert是在wiki上训练的。我用kp20k训练了一个mini bert，在测试集上的accuracy目前是80%，你要不要试试用我这个作为encoder？ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#59 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHCjdIT1G6Icse3LK2SZXO194JJTiM1Qks5uzqhMgaJpZM4Y3HWV> .

waynedane · 2018-11-29T02:12:39Z

accuracy 是masklm和nextsentence两个任务的，不是key phrase generation，我没说清楚，抱歉。我的算力有限，两块p100, 快一个月了，目前还没训练完。80%是当前的表现。

whqwill · 2018-11-29T02:15:53Z

你提到的mini bert 是什么意思？

whqwill · 2018-11-29T02:29:15Z

我大概理解你的意思了，你相当于是用kp20重新预训练一个bert，不过这样做... 感觉确实蛮麻烦。

waynedane · 2018-11-29T02:38:39Z

我大概理解你的意思了，你相当于是用kp20重新预训练一个bert，不过这样做... 感觉确实蛮麻烦。

是的，用的是 Junseong Kim的代码：https://github.com/codertimo/BERT-pytorch ，模型规模比谷歌的BERT-Base Uncased都小很多。这个是L-8 H-256 A-8.我把目前训练的checkpoint和vocab文件发给你

whqwill · 2018-11-29T03:06:06Z

但是你这个checkpoint，我的这个版本能直接用吗，还是说我必须装你的那个版本的代码？

whqwill · 2018-11-29T03:07:11Z

你可以发到我邮箱 whqwill@126.com ，谢

waynedane · 2018-11-29T03:17:11Z

但是你这个checkpoint，我的这个版本能直接用吗，还是说我必须装你的那个版本的代码？

可以根据Junseong Kim 的代码创建一个bert model然后加载参数，不一定得安装

whqwill · 2018-11-29T03:21:56Z

好的把。那你把checkpoint 发给我试试。

thomwolf · 2018-11-29T07:47:06Z

Hi guys,
I would like to keep the issues of this repository focused on the package it-self.
I also think it's better to keep the conversation in english so everybody can participate.
Please move this conversation to your repository: https://github.com/memray/seq2seq-keyphrase-pytorch or emails.
Thanks, I am closing this discussion.
Best,

InsaneLife · 2019-02-25T02:57:02Z

accuracy 是masklm和nextsentence两个任务的，不是key phrase generation，我没说清楚，抱歉。我的算力有限，两块p100, 快一个月了，目前还没训练完。80%是当前的表现。
你好，能把mini版模型发我一下吗，993001803@qq.com，谢谢啦。

Accagain2014 · 2019-07-22T01:50:08Z

hi, @whqwill I have some doubts about the usage manner of bert with RNN.
In bert with RNN method, I see you only consider the last term's representation (I mean the TN's) as the input to RNN decoder, why not use the other term's representation, like T1 to TN-1 ? I think the last term's information is too less to represent all the context information.

ra

Setup conda

thomwolf closed this as completed Nov 29, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

iamfaith mentioned this issue Nov 24, 2020

"AutoTokenizer.from_pretrained" does not work when loading a pretrained Albert model #8748

Closed

4 tasks

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#59 from huggingface/main

2efbd38

ra

ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024

Merge pull request huggingface#59 from Sciumo/setup_conda

8b67f7d

Setup conda

ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024

Refractor readme (huggingface#59)

3d1b92c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not good when I use BERT for seq2seq model in keyphrase generation #59

not good when I use BERT for seq2seq model in keyphrase generation #59

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 28, 2018

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 29, 2018 via email

waynedane commented Nov 29, 2018

whqwill commented Nov 29, 2018

whqwill commented Nov 29, 2018

waynedane commented Nov 29, 2018

whqwill commented Nov 29, 2018

whqwill commented Nov 29, 2018

waynedane commented Nov 29, 2018

whqwill commented Nov 29, 2018

thomwolf commented Nov 29, 2018 •

edited

Loading

InsaneLife commented Feb 25, 2019

Accagain2014 commented Jul 22, 2019

not good when I use BERT for seq2seq model in keyphrase generation #59

not good when I use BERT for seq2seq model in keyphrase generation #59

Comments

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 28, 2018

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 28, 2018

waynedane commented Nov 28, 2018

waynedane commented Nov 28, 2018

whqwill commented Nov 29, 2018 via email

waynedane commented Nov 29, 2018

whqwill commented Nov 29, 2018

whqwill commented Nov 29, 2018

waynedane commented Nov 29, 2018

whqwill commented Nov 29, 2018

whqwill commented Nov 29, 2018

waynedane commented Nov 29, 2018

whqwill commented Nov 29, 2018

thomwolf commented Nov 29, 2018 • edited Loading

InsaneLife commented Feb 25, 2019

Accagain2014 commented Jul 22, 2019

thomwolf commented Nov 29, 2018 •

edited

Loading