Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency parser #2486

Merged
merged 15 commits into from
Nov 30, 2021
Merged

Dependency parser #2486

merged 15 commits into from
Nov 30, 2021

Conversation

73minerva
Copy link
Contributor

@73minerva 73minerva commented Oct 22, 2021

This PR adds dependency parser model based on Deep biaffine attention for neural dependency parsing. The main model in paper uses a static word embedding and POS tags vector as input, while it only uses StackedEmbeddings.
The first trained model can be downloaded from here. It was trained with Glove and Flair embeddings on flair.datasets.UD_ENGLISH dataset. If I understand correctly, the UD_ENGLISH dataset in flair library, only contains the "EWT" version, not the other ones mentioned in this. In the end, the following results were obtained:
UAS : 0.9032 - LAS : 0.9243
In the near future, I will train the model on PTB dataset to have a comparison with paper results.

For better printing of parsed sentences, I add the "print_tree" param to predict method:

sentence = Sentence('I prefer the morning flight through Denver.')
dep_parser_model: DependencyParser = DependencyParser.load(path_to_model)
dep_parser_model.predict(sentence, print_tree=True)

This should print something like this:
image

It is not done yet and needs some refactoring. Until then, it will be great if you share your comments and suggestions.

@73minerva 73minerva closed this Oct 22, 2021
@73minerva 73minerva changed the title Dep parser Dependency parser Oct 22, 2021
@73minerva 73minerva reopened this Oct 23, 2021
@alanakbik
Copy link
Collaborator

@73minerva really cool, thanks for adding this and sorry for not getting around to reviewing this sooner.

I'm trying to train a model, but its throwing errors if the mini-batch size is higher than 1. See script below - am I instantiating it wrong?

corpus = UD_ENGLISH()

dictionary = corpus.make_label_dictionary("dependency")

model = DependencyParser(token_embeddings=WordEmbeddings("turian"), relations_dictionary=dictionary)

trainer = ModelTrainer(model, corpus)

trainer.train("resources/taggers/dependency", mini_batch_size=2)

@73minerva
Copy link
Contributor Author

@alanakbik, The recent commit should fix the mentioned mini-batch error. I also added word dropout.
Here is a trained dependency parser model with bert that achieved UAS : 0.9210 - LAS : 0.9354.

@73minerva 73minerva marked this pull request as ready for review November 13, 2021 22:37
@stefan-it
Copy link
Member

Hi @73minerva ,

thanks for that PR! I wanted to test it with Transformer-based models, but this failed due to an error that was I think fixed a while ago on Flair master. So could you please rebase your PR on latest master branch 🤔 Many thanks!

Copy link
Collaborator

@alanakbik alanakbik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments and some questions on parameter choices.

If you prefer, I can merge this PR now and do the rebase to master @stefan-it mentioned and we can do further improvements in more PRs.

'gpu' to store embeddings in GPU memory.
"""

if not sentences:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this line do?

self.lstm_input_dim: int = self.token_embeddings.embedding_length

if self.relations_dictionary:
self.embedding2nn = torch.nn.Linear(self.lstm_input_dim,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This layer does not seem to be used anywhere

self.embedding2nn = torch.nn.Linear(self.lstm_input_dim,
self.lstm_input_dim)

self.lstm = BiLSTM(input_size=self.lstm_input_dim,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: Why not use the default implementation of LSTM bei PyTorch? Is the variational aspect important for performance?

mlp_arc_units: int = 500,
mlp_rel_units: int = 100,
lstm_layers: int = 3,
mlp_dropout: float = 0.33,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These default dropout values seem quite high. Have you tried lower values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All default hyperparameters including dropout, were set according to the paper. As mentioned in the paper, to reduce label classifier overfitting, increasing dropout is necessary. More details and why label classifier suffers from overfitting are explained in the 4.2.1 section of the paper. I haven't tried lower values and unfortunately, there is no experiment about it.

token.get_tag(gold_label_type).value,
str(token.head_id),
tag,
str(arc))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sometimes a tensor and sometimes an int, leading to uneven printouts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, It's fixed on the recent commit. Also, I add CSV header to prevent confusion. it's better now.

Screenshot from 2021-11-28 11-38-57

@alanakbik
Copy link
Collaborator

@73minerva thanks a lot for adding this! And sorry for taking so long to review!

@alanakbik alanakbik merged commit 7192d64 into flairNLP:master Nov 30, 2021
@alanakbik
Copy link
Collaborator

@73minerva another thing: the LAS should only count positives if both attachment and deprel are correctly predicted. So the LAS should always be lower than the UAS. I have it fixed in my local branch and will do a PR soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants