Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for XLNet and the new whole-word-masking variant of BERT. #730

Closed
sleepinyourhat opened this issue Jun 24, 2019 · 6 comments
Closed
Assignees
Labels
0.0.x release-on-fix Put out a new 0.0.x release when this is fixed. help wanted Extra attention is needed high-priority Fix this before addressing any other major issue. jiant-v1-legacy Relevant to versions <= v1.3.2

Comments

@sleepinyourhat
Copy link
Contributor

No description provided.

@sleepinyourhat
Copy link
Contributor Author

This shouldn't require any code changes, just updating to the next release of pytorch_pretrained_bert once it comes out. We can test it out by pulling pytorch_pretrained_bert at head now.

@sleepinyourhat sleepinyourhat changed the title Add support for whole-word Add support for the new whole-word-masking variant of BERT. Jun 24, 2019
@sleepinyourhat sleepinyourhat added help wanted Extra attention is needed high-priority Fix this before addressing any other major issue. labels Jun 26, 2019
@sleepinyourhat
Copy link
Contributor Author

sleepinyourhat commented Jun 26, 2019

At least the new BERT model is now out in pytorch_pretrained_bert. Now, does anyone see where/how we're installing pytorch_pretrained_bert?

This doesn't need to go along with 1.0, but as soon as the code is stable, we should add it. Frankly: For the vast majority of new experiments, it doesn't make sense to use plain BERT.

@sleepinyourhat sleepinyourhat changed the title Add support for the new whole-word-masking variant of BERT. Add support for XLNet and the new whole-word-masking variant of BERT. Jun 26, 2019
@sleepinyourhat
Copy link
Contributor Author

It looks like this'll break old task pickles, but doesn't require any change to our code.

@W4ngatang
Copy link
Collaborator

I'm not super familiar with XLNet on a low level; do they use the same BERT modeling tricks (special tokens, concatenating inputs, etc.) ?

Mostly our BERT-specific code is very BERT-specific, so we'd likely have to rename various variables at a minimum to incorporate XLNet. I'm worried about more drastic changes; I don't think our code is well-abstracted enough to support arbitrary (pretrained LM) model switch. BERT with whole word masking seems like it shouldn't break too much, if anything at all.

@sleepinyourhat
Copy link
Contributor Author

I also misread the HuggingFace readme—XLNet isn't ready yet—so we'll have to wait and see.

I found the requirement, though—we were sneakily installing the HuggingFace repo via Allen. I'll at least try to get whole-word set up.

@sleepinyourhat
Copy link
Contributor Author

The HuggingFace update is out—I'll start looking into adding support...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.0.x release-on-fix Put out a new 0.0.x release when this is fixed. help wanted Extra attention is needed high-priority Fix this before addressing any other major issue. jiant-v1-legacy Relevant to versions <= v1.3.2
Projects
None yet
Development

No branches or pull requests

4 participants