-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LayoutLMv2ForRelationExtraction #19120
Add LayoutLMv2ForRelationExtraction #19120
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@sgugger I'm getting the following error from
However, this model is added to layoutlmv2.mdx, so not sure why this error occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, but really not convinced by the model added as it is. Some preprocessing code is part of the model, which is not how we usually do things in Transformers. Some inputs are not tensors, which causes multiple problems (tests overridden are just the tip of the iceberg) as a result.
The whole preprocessing should be added as a method on the preprocessing class of LayoutLMv2 and the model should accept tensors, listed as keyword arguments and not inside a dictionary.
Class for outputs of [`LayoutLMv2ForRelationExtraction`]. | ||
|
||
Args: | ||
loss (`torch.FloatTensor` of shape `(1,)`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of PyTorch loss functions are 0d tensors, so this is not accurate.
|
||
Args: | ||
loss (`torch.FloatTensor` of shape `(1,)`: | ||
Classification (or regression if config.num_labels==1) loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code, it's always a classification loss, so this needs to be adapted.
- x_1: `(N, *, in_features)` where `N` is the batch dimension and `*` means any number of additional | ||
dimensisons. | ||
- x_2: `(N, *, in_features)`, where `N` is the batch dimension and `*` means any number of additional | ||
dimensions. | ||
- Output: `(N, *, out_features)`, where `N` is the batch dimension and `*` means any number | ||
of additional dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No one-letter variables please. Especially not in the documentation which we don't mind being long. Here just say (batch_size, *, features)
where *
means any number of additional dimensisons.
""" | ||
|
||
def __init__(self, in_features, out_features): | ||
super(BiaffineAttention, self).__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(BiaffineAttention, self).__init__() | |
super().__init__() |
Python 2 was dead two years ago...
self.ffnn_head = copy.deepcopy(projection) | ||
self.ffnn_tail = copy.deepcopy(projection) | ||
self.rel_classifier = BiaffineAttention(config.hidden_size // 2, 2) | ||
self.loss_fct = CrossEntropyLoss() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to hard-code this here. Just use it when necessary in the forward.
head_repr = torch.cat( | ||
(hidden_states[b][head_index], head_label_repr), | ||
dim=-1, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
head_repr = torch.cat( | |
(hidden_states[b][head_index], head_label_repr), | |
dim=-1, | |
) | |
head_repr = torch.cat((hidden_states[b][head_index], head_label_repr), dim=-1) |
Fits in one line.
tail_repr = torch.cat( | ||
(hidden_states[b][tail_index], tail_label_repr), | ||
dim=-1, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tail_repr = torch.cat( | |
(hidden_states[b][tail_index], tail_label_repr), | |
dim=-1, | |
) | |
taill_repr = torch.cat((hidden_states[b][tail_index], tail_label_repr), dim=-1) |
self.rel_classifier = BiaffineAttention(config.hidden_size // 2, 2) | ||
self.loss_fct = CrossEntropyLoss() | ||
|
||
def build_relation(self, relations, entities): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a preprocessing method. It shouldn't be part of the model.
>>> encoding["entities"] = [{"start": [0, 4], "end": [3, 6], "label": [2, 1]}] | ||
>>> encoding["relations"] = [{"start_index": [], "end_index": [], "head": [], "tail": []}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model cant take lists as inputs as they then won't work with ONNX/distributed etc. This should all be tensors.
# (Even with this call, there are still memory leak by ~0.04MB) | ||
self.clear_torch_jit_class_registry() | ||
|
||
# overwrite as LayoutLMv2ForRelationExtraction outputs dictonaries containing integers rather than tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which really shouldn't be the case...
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey @NielsRogge , any chance that this will ever be implemented? |
Hi @lamaeldo, The reason the PR wasn't merged is because models need to output fixed size tensors, to make sure things like distributed training and ONNX export work. However LayoutLMv2ForRelationExtraction outputs lists of tensors in its current implementation, due to each example in the batch having a different amount of relations. So we would need to pad them up to a fixed size such that the model outputs fixed size tensors. Haven't looked into that yet but if you're willing to contribute, let me know! Btw I do have a notebook on fine-tuning this model here. |
What does this PR do?
This PR adds the relation extraction head of LayoutLMv2, which was a highly requested feature as seen in #14330 #15451 #18091