-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get the word embedding after pre-training? #60
Comments
If you want to get the contextual embeddings (like ELMo) see the section here. If you want the actual word embeddings, the word->id mapping is just the index of the row in |
And I download your released model of chinese_L-12_H-768_A-12. In vocab.txt, I found some token such as |
The [CLS], [SEP] and [MASK] tokens are used as described in the paper and README. The [unused] tokens were not used in our model and are randomly initialized. |
What is your training data of chinese_L-12_H-768_A-12? And what is it's size? |
It's Chinese wikipedia with both Traditional and Simplified characters. |
Hello @mfxss , https://github.com/imgarylai/bert-embedding Because I'm working closely with mxnet & gluonnlp team, my implementation is done by using mxnet and gluonnlp. However, I am trying to implement it in all other different frameworks. Hope my works can help you. |
Hey guys, if you don't want to install an extra module, here is an example: BERT_PATH = 'HOME_DIR/bert_en_uncased_L-12_H-768_A-12'
import tensorflow as tf
imported = tf.saved_model.load(BERT_PATH)
for i in imported.trainable_variables:
if i.name == 'bert_model/word_embeddings/embeddings:0':
embeddings = i And |
Hi @jacobdevlin-google Thanks for the pointers. I see the output with the |
Excuse me did you find a solution for word not subword , please |
Hi,
I am excited on this great model. And I want to get the word embedding . Where shold I find the file from output or should I change to code to do this?
Thanks,
Yuguang
The text was updated successfully, but these errors were encountered: