-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for XLNet and the new whole-word-masking variant of BERT. #730
Comments
This shouldn't require any code changes, just updating to the next release of pytorch_pretrained_bert once it comes out. We can test it out by pulling pytorch_pretrained_bert at head now. |
At least the new BERT model is now out in pytorch_pretrained_bert. Now, does anyone see where/how we're installing pytorch_pretrained_bert? This doesn't need to go along with 1.0, but as soon as the code is stable, we should add it. Frankly: For the vast majority of new experiments, it doesn't make sense to use plain BERT. |
It looks like this'll break old task pickles, but doesn't require any change to our code. |
I'm not super familiar with XLNet on a low level; do they use the same BERT modeling tricks (special tokens, concatenating inputs, etc.) ? Mostly our BERT-specific code is very BERT-specific, so we'd likely have to rename various variables at a minimum to incorporate XLNet. I'm worried about more drastic changes; I don't think our code is well-abstracted enough to support arbitrary (pretrained LM) model switch. BERT with whole word masking seems like it shouldn't break too much, if anything at all. |
I also misread the HuggingFace readme—XLNet isn't ready yet—so we'll have to wait and see. I found the requirement, though—we were sneakily installing the HuggingFace repo via Allen. I'll at least try to get whole-word set up. |
The HuggingFace update is out—I'll start looking into adding support... |
No description provided.
The text was updated successfully, but these errors were encountered: