New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Dependency parser #2486

Merged

alanakbik merged 15 commits into flairNLP:master from 73minerva:dep_parser

Nov 30, 2021

Contributor

73minerva commented Oct 22, 2021 •

edited

Loading

This PR adds dependency parser model based on Deep biaffine attention for neural dependency parsing. The main model in paper uses a static word embedding and POS tags vector as input, while it only uses StackedEmbeddings.
The first trained model can be downloaded from here. It was trained with Glove and Flair embeddings on flair.datasets.UD_ENGLISH dataset. If I understand correctly, the UD_ENGLISH dataset in flair library, only contains the "EWT" version, not the other ones mentioned in this. In the end, the following results were obtained:
UAS : 0.9032 - LAS : 0.9243
In the near future, I will train the model on PTB dataset to have a comparison with paper results.

For better printing of parsed sentences, I add the "print_tree" param to predict method:

sentence = Sentence('I prefer the morning flight through Denver.')
dep_parser_model: DependencyParser = DependencyParser.load(path_to_model)
dep_parser_model.predict(sentence, print_tree=True)

This should print something like this:

It is not done yet and needs some refactoring. Until then, it will be great if you share your comments and suggestions.

73minerva added 12 commits

October 20, 2021 22:38


          add dep parser init

bfb26bf


          add some docs

27224fc


          add forward

804c214


          loss calculate for dep parser

e5e4c8d


          add predict and some doc

e846926


          add evaluation and some refactoring

84bd1cd


          state dict for dep parser

4673ae4


          add variational lstm for dep parser

fbd234e


          biaffine and mlp for dep parser

f658f63


          uas and las metric

d9a7c28


          tree printer

4b0fa80


          add pptree

429c799

73minerva closed this

73minerva changed the title ~~Dep parser~~ Dependency parser

73minerva reopened this

Collaborator

alanakbik commented Nov 8, 2021

@73minerva really cool, thanks for adding this and sorry for not getting around to reviewing this sooner.

I'm trying to train a model, but its throwing errors if the mini-batch size is higher than 1. See script below - am I instantiating it wrong?

corpus = UD_ENGLISH()

dictionary = corpus.make_label_dictionary("dependency")

model = DependencyParser(token_embeddings=WordEmbeddings("turian"), relations_dictionary=dictionary)

trainer = ModelTrainer(model, corpus)

trainer.train("resources/taggers/dependency", mini_batch_size=2)

73minerva added 2 commits

November 12, 2021 02:32


          mini batch bug fixed and add word dropout

8ddadb9


          refctor evaluating

36f5354

Contributor Author

73minerva commented Nov 13, 2021

@alanakbik, The recent commit should fix the mentioned mini-batch error. I also added word dropout.
Here is a trained dependency parser model with bert that achieved UAS : 0.9210 - LAS : 0.9354.

73minerva marked this pull request as ready for review

November 13, 2021 22:37

Member

stefan-it commented Nov 20, 2021

Hi @73minerva ,

thanks for that PR! I wanted to test it with Transformer-based models, but this failed due to an error that was I think fixed a while ago on Flair master. So could you please rebase your PR on latest master branch 🤔 Many thanks!

alanakbik approved these changes

View reviewed changes

Collaborator

alanakbik left a comment

A few minor comments and some questions on parameter choices.

If you prefer, I can merge this PR now and do the rebase to master @stefan-it mentioned and we can do further improvements in more PRs.

flair/models/dependency_parser_model.py Outdated

+                      'gpu' to store embeddings in GPU memory.
+                      """
+                      if not sentences:

Collaborator

alanakbik Nov 8, 2021

what does this line do?

flair/models/dependency_parser_model.py Outdated

+                      self.lstm_input_dim: int = self.token_embeddings.embedding_length
+                      if self.relations_dictionary:
+                          self.embedding2nn = torch.nn.Linear(self.lstm_input_dim,

Collaborator

alanakbik Nov 24, 2021

This layer does not seem to be used anywhere

flair/models/dependency_parser_model.py

+                          self.embedding2nn = torch.nn.Linear(self.lstm_input_dim,
+                                                              self.lstm_input_dim)
+                      self.lstm = BiLSTM(input_size=self.lstm_input_dim,

Collaborator

alanakbik Nov 24, 2021

Curious: Why not use the default implementation of LSTM bei PyTorch? Is the variational aspect important for performance?

flair/models/dependency_parser_model.py

+                          mlp_arc_units: int = 500,
+                          mlp_rel_units: int = 100,
+                          lstm_layers: int = 3,
+                          mlp_dropout: float = 0.33,

Collaborator

alanakbik Nov 24, 2021

These default dropout values seem quite high. Have you tried lower values?

Contributor Author

73minerva Nov 27, 2021

All default hyperparameters including dropout, were set according to the paper. As mentioned in the paper, to reduce label classifier overfitting, increasing dropout is necessary. More details and why label classifier suffers from overfitting are explained in the 4.2.1 section of the paper. I haven't tried lower values and unfortunately, there is no experiment about it.

flair/models/dependency_parser_model.py Outdated

+                                                                        token.get_tag(gold_label_type).value,
+                                                                        str(token.head_id),
+                                                                        tag,
+                                                                        str(arc))

Collaborator

alanakbik Nov 24, 2021

this is sometimes a tensor and sometimes an int, leading to uneven printouts

Contributor Author

73minerva Nov 28, 2021

Thanks, It's fixed on the recent commit. Also, I add CSV header to prevent confusion. it's better now.


          refactoring and bug fix

5b9d020

Collaborator

alanakbik commented Nov 30, 2021

@73minerva thanks a lot for adding this! And sorry for taking so long to review!

alanakbik merged commit 7192d64 into flairNLP:master

Collaborator

alanakbik commented Dec 1, 2021

@73minerva another thing: the LAS should only count positives if both attachment and deprel are correctly predicted. So the LAS should always be lower than the UAS. I have it fixed in my local branch and will do a PR soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet