-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependency parser #2486
Dependency parser #2486
Conversation
@73minerva really cool, thanks for adding this and sorry for not getting around to reviewing this sooner. I'm trying to train a model, but its throwing errors if the mini-batch size is higher than 1. See script below - am I instantiating it wrong? corpus = UD_ENGLISH()
dictionary = corpus.make_label_dictionary("dependency")
model = DependencyParser(token_embeddings=WordEmbeddings("turian"), relations_dictionary=dictionary)
trainer = ModelTrainer(model, corpus)
trainer.train("resources/taggers/dependency", mini_batch_size=2) |
@alanakbik, The recent commit should fix the mentioned mini-batch error. I also added word dropout. |
Hi @73minerva , thanks for that PR! I wanted to test it with Transformer-based models, but this failed due to an error that was I think fixed a while ago on Flair master. So could you please rebase your PR on latest master branch 🤔 Many thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and some questions on parameter choices.
If you prefer, I can merge this PR now and do the rebase to master @stefan-it mentioned and we can do further improvements in more PRs.
'gpu' to store embeddings in GPU memory. | ||
""" | ||
|
||
if not sentences: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this line do?
self.lstm_input_dim: int = self.token_embeddings.embedding_length | ||
|
||
if self.relations_dictionary: | ||
self.embedding2nn = torch.nn.Linear(self.lstm_input_dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This layer does not seem to be used anywhere
self.embedding2nn = torch.nn.Linear(self.lstm_input_dim, | ||
self.lstm_input_dim) | ||
|
||
self.lstm = BiLSTM(input_size=self.lstm_input_dim, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: Why not use the default implementation of LSTM bei PyTorch? Is the variational aspect important for performance?
mlp_arc_units: int = 500, | ||
mlp_rel_units: int = 100, | ||
lstm_layers: int = 3, | ||
mlp_dropout: float = 0.33, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These default dropout values seem quite high. Have you tried lower values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All default hyperparameters including dropout, were set according to the paper. As mentioned in the paper, to reduce label classifier overfitting, increasing dropout is necessary. More details and why label classifier suffers from overfitting are explained in the 4.2.1 section of the paper. I haven't tried lower values and unfortunately, there is no experiment about it.
token.get_tag(gold_label_type).value, | ||
str(token.head_id), | ||
tag, | ||
str(arc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is sometimes a tensor and sometimes an int, leading to uneven printouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@73minerva thanks a lot for adding this! And sorry for taking so long to review! |
@73minerva another thing: the LAS should only count positives if both attachment and deprel are correctly predicted. So the LAS should always be lower than the UAS. I have it fixed in my local branch and will do a PR soon! |
This PR adds dependency parser model based on Deep biaffine attention for neural dependency parsing. The main model in paper uses a static word embedding and POS tags vector as input, while it only uses StackedEmbeddings.
The first trained model can be downloaded from here. It was trained with Glove and Flair embeddings on flair.datasets.UD_ENGLISH dataset. If I understand correctly, the UD_ENGLISH dataset in flair library, only contains the "EWT" version, not the other ones mentioned in this. In the end, the following results were obtained:
UAS : 0.9032 - LAS : 0.9243
In the near future, I will train the model on PTB dataset to have a comparison with paper results.
For better printing of parsed sentences, I add the "print_tree" param to predict method:
This should print something like this:
It is not done yet and needs some refactoring. Until then, it will be great if you share your comments and suggestions.