Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use models from huggingface? #4

Open
JamesArthurHolland opened this issue Oct 12, 2021 · 3 comments
Open

How to use models from huggingface? #4

JamesArthurHolland opened this issue Oct 12, 2021 · 3 comments

Comments

@JamesArthurHolland
Copy link

How do I use models that aren't in the specified list?

I would like to use this model:

https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased

How do I go about doing this?

Regards,

Jamie

@BPYap
Copy link
Owner

BPYap commented Oct 19, 2021

Hi Jamie,

One way to do it is to first download the weights, vocab and config file to a local folder then set the --model_name_or_path flag to the path of that local folder.

@JamesArthurHolland
Copy link
Author

JamesArthurHolland commented Oct 19, 2021

I'm very unfamiliar with these formats. I downloaded the tensorflow package for the spanish uncased, it only has the following files:

model.ckpt-2000000.index
model.ckpt-2000000.data-00000-of-00001
model.ckpt-2000000.meta

The pytorch version only has:

pytorch_model.bin

But the BERT-WSD library appears to look for a config file, which you also mentioned. Is this a tensorflow version specific thing?

@BPYap
Copy link
Owner

BPYap commented Oct 20, 2021

You will only need the pytorch_model.bin along with vocab.txt and config.json under the same directory. It seems that the links for the vocab and config files are broken in the Hugging Face model repository. Upon closer look I found the working links in the colaboratory notebook provided by the authors: https://colab.research.google.com/drive/1uRwg4UmPgYIqGYY4gW_Nsw9782GFJbPt.

You can obtain the two files from the following links:
https://users.dcc.uchile.cl/~jperez/beto/cased_2M/vocab.txt
https://users.dcc.uchile.cl/~jperez/beto/cased_2M/config.json

Hope it helps. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants