a question about self.vocab #3

tomtang110 · 2019-01-17T10:14:33Z

Could you explain why you add self.vocab_size between question id and answer id?

benywon · 2019-01-17T10:24:33Z

Could you explain why you add self.vocab_size between question id and answer id?

The self.vocab_size is just a padding symbol to separate the question and the answer.

tomtang110 · 2019-01-18T01:41:50Z

想问下，你们训练的单词只是针对你们的word2id.obj文件吗？如果我自己建一套我自己的word2id可以使用你们的模型吗？主要我看了下词只有57777个感觉，有点少。

benywon · 2019-01-18T05:50:26Z

Definitely!! Different word2id would project the same word to a different id. So you should use my word2id.obj. BTW, 57777 words is not very small as we use the sentencepiece word tokenizer, so OOV is not a problem.

tomtang110 · 2019-01-18T07:43:31Z

But, I need more than 450000 words, 57777 for 450000 is few. It is so upset. Therefore, for most companies, I think Bert is still difficult for training, even, fine-tune.

benywon · 2019-01-18T08:17:10Z

Oh, that so bad, if you have your own vocab, this application may not suitable for you. Nevertheless, you can use my codes to train your own BERT.

tomtang110 · 2019-01-18T15:34:58Z

haha, But my company has no so affluent hardware equipment. My boss told me, next year, they would introduce cloud server, But at that time, I have finished my internship. Actually, I have made a machine comprehension reading system, I used the QAnet based on transformer as model, so I would like to try to use bert to train the Dureader dataset. However, it seems its value is too high to train it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question about self.vocab #3

a question about self.vocab #3

tomtang110 commented Jan 17, 2019

benywon commented Jan 17, 2019

tomtang110 commented Jan 18, 2019 •

edited

Loading

benywon commented Jan 18, 2019

tomtang110 commented Jan 18, 2019

benywon commented Jan 18, 2019

tomtang110 commented Jan 18, 2019

a question about self.vocab #3

a question about self.vocab #3

Comments

tomtang110 commented Jan 17, 2019

benywon commented Jan 17, 2019

tomtang110 commented Jan 18, 2019 • edited Loading

benywon commented Jan 18, 2019

tomtang110 commented Jan 18, 2019

benywon commented Jan 18, 2019

tomtang110 commented Jan 18, 2019

tomtang110 commented Jan 18, 2019 •

edited

Loading