Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RelationExtraction head to layoutLMv2 and layoutXLM models #15451

Open
R0bk opened this issue Feb 1, 2022 · 31 comments
Open

Adding RelationExtraction head to layoutLMv2 and layoutXLM models #15451

R0bk opened this issue Feb 1, 2022 · 31 comments

Comments

@R0bk
Copy link

R0bk commented Feb 1, 2022

🌟 New model head addition

Relation Extraction Head for LayoutLMv2/XLM

Addition description

Hey all,

I've see a bunch of different requests across huggingface issues [0], unilm issues [0][1] and on @NielsRogge Transformer Tutorials issues [0][1] about adding the relation extraction head from layoutlmv2 to the huggingface library. As the model is quite difficult to use in it's current state I was going to write my own layer ontop but I saw in this issue that it may be a good idea to add it to transformers as a separate layoutlmv2/xlm head and thought it would be a good way to contribute back to a library I use so much.

I've gone ahead and added it under my own branch and got it successfully working with the library. Here is a colab using my branch of transformers if you want to test it yourself.

Before I add tests/ write more docs I just wanted to post here first to see if there's interest in potentially merging this in. If there is interest I have a few questions that it would be helpful to get some info on to ensure that I've correctly done the integration.

@R0bk R0bk added the New model label Feb 1, 2022
@NielsRogge
Copy link
Contributor

Hi,

That's great to read :) it was a bit unclear to me how to use the model at inference time (the authors only provided a script for training and evaluation, i.e. when labels are available). Can you show how to use the model when you don't have any labels available? More specifically, what are the entities and relations one needs to provide at inference time?

I assume that the model needs all possible entities, as well as all possible relations in order to classify them pairwise.

In that case, we can add it. There was already an effort to do this (see #15173).

@R0bk
Copy link
Author

R0bk commented Feb 1, 2022

Hey Niels,

I've added to the bottom of this notebook here an inference example (please ignore the accuracy, I didn't spend much time finetuning).

For running the inference we just require an empty relations dict as we calculate what all possible relations could be based on the entity labels (the current model only links between entities with labels 1 (the key) and 2 (the value)).

We do however require all the entities to be labelled with the start token index, end token index and a label so we would probably suggest to users in the docs to run LayoutLMv2ForTokenClassification first and then run this based on the results of that.

I'm not really experienced enough with the library to review the previous effort but I think there may be a few things missing there. In terms of going forward would you prefer if I made a new PR from my branch or tried to modify that PR to conform?

@R0bk
Copy link
Author

R0bk commented Feb 1, 2022

Also just forgot to add but on the detailed form that entities and relations should be in I put it all in model input and output docstring:
https://github.com/R0bk/transformers/blob/9c0e0ba9ccc0d32b795c2c0e0130931b92230292/src/transformers/models/layoutlmv2/outputs_layoutlmv2.py#L26-L74

@NielsRogge
Copy link
Contributor

Awesome work!! I'll have to deep dive a bit into this, but looks very nice. Let's add the head model to the library.

I guess we can continue our discussion on #15173?

Btw, Detectron2 now has support for torch 1.10, you can now easily install it in Colab using:

!python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html

@R0bk
Copy link
Author

R0bk commented Feb 2, 2022

Ahh that's so much easier for detectron thanks for that :) .

Also stoked to hear that we can integrate this. There's a few things that I thought I should mention where I'm not sure where to put them so I thought I'd just comment them here and get some advice from you.

Questions:

  • I had to add an argument _configuration_file to the model init but this is only required in transformers post v16.0 (inclusive), works without in v15.0 (think this is related to edits to kwargs in configuration_utils.py). Are we meant to add this?

  • Using the default tokenizer and padding seems to use the default huggingface pad token [PAD] but this token isn't in the microsoft/layoutxlm-base tokenizer's vocab so doing padding results in a OOV error. The pad token <pad> has the id 1 so I've been setting the tokenizer to use that token. I feel like I've been missing something obvious here, have I?

  • I believe a lot of users would want to use the model direct on the outputs of their ForTokenization head. Unfortunately this RelationExtraction head in it's current state requires the user to select some entities on their page as a key and some other entities as a value. I feel as though we can make the model a lot more applicable to general users if we allow the model to not only map between keys and values but between all entities. I believe the original authors only limited the linking because in their dataset they are given all keys on a page and all values on a page, hence they only had to map between keys and values and in order to maximize accuracy they only did this mapping.

    For users who want to do a common task, for example identify which entities detected belong in the same table row they won't necessarily have a key which the other entities can map to. In fact in this case the users are probably better off than the original authors as they have only selected entities they are interested in where the original authors have all key value pairs highlighted. So even if we increase the big O complexity of all relation mapping from O(keys*values) where at largest max(|keys|, |values|)=|entities|/2 to O(entities^2) in general this should be fine as |entities| should smaller. The following line could be put under a config flag and then the user could have the option of how they want the model to operate.

https://github.com/R0bk/transformers/blob/d9fe818083017d49487a3a45ca99f52123d68628/src/transformers/models/layoutlmv2/modeling_layoutlmv2.py#L1431

  • We could also allow the user to specify a dictionary of labels that are allowed to match with each other. Do you think something like this would be a good idea to add? I can write an example colab if that would be helpful in demonstrating what I mean.

  • With the work I've done I've noticed the current model with the RE head is a bit sensitive to train, collapsing in about 1/4 of the runs that I've done (on XFUN dataset)

    It really requires running with lr warmup (based on my testing I'd recommend linear warmup [0, 5e-5] with about 15% of total steps) otherwise it is even more sensitive, collapsing in just over 1/2 the runs. Is there somewhere we can put this as advice for users who may not have dealt with things like this before?

  • The data collator I have in the colab may not be immediately ovbious to write for users who want to train the model, do you ever add data collators to the library? Or do you think it would potentially be a good idea to add some code to the LayoutLMv2FeatureExtractor to make it a bit easier for users?

  • Post merge, all working and a bit of cleanup do you think it would be a good idea to do a PR to add that colab as an example to your transformer tutorials repo?

  • Also I noticed in Add class LayoutLMv2ForRelationExtraction #15173 a lot of the comments are already fixed in the branch I have, since I can't edit that PR is there any way I can pull those changes in?

Notes:

  • I noticed in the official implementation (unilm one) we do the RGB to BGR swap twice, once in the dataset, once in the feature extractor. Not sure if this makes much of a difference as most documents are greyscale. But I fixed this along with using the newer tokenizer and outputting a hd image in my dataset here

@R0bk
Copy link
Author

R0bk commented Feb 2, 2022

Also one more general LayoutLMv2/XLM question based on what I saw when writing the dataset. From my understanding the current processor/ feature extractor splits on words, tokenizes and then returns a flattened list of the tokens along with the original bounding boxes duplicated for where there was multiple tokens.

With character based languages I think this may cause some issues hence why the original authors did the processing differently in the XFUN dataset code. I believe that if we split by words most software will split the characters on their own, if we pass this result to the processor/ feature extractor then the tokenizer can't run correctly as it can't group multiple characters together into a single token id. And if we pass in a whole line at once the processor/ tokenizer will create the token ids correctly but will just duplicate the bounding box of the entire line over and over.

Is my understanding correct? And if so do you think we could create a different way of using the processor/ feature extractor where you can pass in a whole line along with the bounding boxes for each character in that line and then use the offset mappings from the tokenizer to remap the bounding boxes correctly?

@Isha09Garg
Copy link

I'm experimenting with LayoutLMv2 and LayoutLMForRelationExtraction.

I referred to https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb for entity detection/predictions using LayoutLMv2

Can someone help me how can I convert these predictions from LayoutLMv2 to entity dict ( which is input to LayoutLMv2ForRelationExtraction)
{
'start': torch.IntTensor of shape (num_entites),
Each value in the list represents the id of the token (element of range(0, len(tokens)) where the
entity starts
'end': torch.IntTensor of shape (num_entites),
Each value in the list represents the id of the token (element of range(0, len(tokens)) where the
entity ends
'label': torch.IntTensor of shape (num_entites)
Each value in the list represents the label (as an int) of the entity
}

@NielsRogge
Copy link
Contributor

NielsRogge commented Feb 4, 2022

I had to add an argument _configuration_file to the model init but this is only required in transformers post v16.0 (inclusive), works without in v15.0 (think this is related to edits to kwargs in configuration_utils.py). Are we meant to add this?

This is weird, might be a bug. cc @sgugger

Using the default tokenizer and padding seems to use the default huggingface pad token [PAD] but this token isn't in the microsoft/layoutxlm-base tokenizer's vocab so doing padding results in a OOV error. The pad token has the id 1 so I've been setting the tokenizer to use that token. I feel like I've been missing something obvious here, have I?

You mean using tokenizer = LayoutXLMTokenizer.from_pretrained("microsoft/layoutxlm-base")? Normally this tokenizer should appropriately pad sequences. Can you provide a code snippet that reproduces your issue?

I believe a lot of users would want to use the model direct on the outputs of their ForTokenization head.

This is fine for me!

We could also allow the user to specify a dictionary of labels that are allowed to match with each other. Do you think something like this would be a good idea to add? I can write an example colab if that would be helpful in demonstrating what I mean.

Yes this seems very useful.

With the work I've done I've noticed the current model with the RE head is a bit sensitive to train, collapsing in about 1/4 of the runs that I've done (on XFUN dataset)

It really requires running with lr warmup (based on my testing I'd recommend linear warmup [0, 5e-5] with about 15% of total steps) otherwise it is even more sensitive, collapsing in just over 1/2 the runs. Is there somewhere we can put this as advice for users who may not have dealt with things like this before?

Yes we usually have a tips section on each model's documentation page, e.g. LayoutLMv2's one can be found here (right below the abstract of the paper). We can link to additional notebooks for more info.

The data collator I have in the colab may not be immediately ovbious to write for users who want to train the model, do you ever add data collators to the library? Or do you think it would potentially be a good idea to add some code to the LayoutLMv2FeatureExtractor to make it a bit easier for users?

We do have data collators in the library, you can find them here. Alternatively, we can include code in the feature extractor as long as we don't break existing code of our users. Maybe the data collator could be the best option here.

Post merge, all working and a bit of cleanup do you think it would be a good idea to do a PR to add that colab as an example to your transformer tutorials repo?

Yes sure, lot's of people have been asking for this, so let's add a clean notebook with additiona documentation such that people really know how the model works.

Feel free to open a new PR!

@sgugger
Copy link
Collaborator

sgugger commented Feb 4, 2022

I had to add an argument _configuration_file to the model init but this is only required in transformers post v16.0 (inclusive), works without in v15.0 (think this is related to edits to kwargs in configuration_utils.py). Are we meant to add this?

No this is not a public-facing argument, and it's for configurations only anyway. It's not used anywhere in the code for pretrained models, so I don't see why it should be needed. You can check every other model in the library and see for yourself that it's not been added :-)

@NielsRogge
Copy link
Contributor

@sgugger I can reproduce the error with _configuration file:

!pip install -q transformers

from transformers import LayoutLMv2ForTokenClassification
model = LayoutLMv2ForTokenClassification.from_pretrained('microsoft/layoutxlm-base', num_labels=10)

gives:

TypeError                                 Traceback (most recent call last)
[<ipython-input-5-b9f90523681c>](https://localhost:8080/#) in <module>()
      1 from transformers import LayoutLMv2ForTokenClassification
----> 2 model = LayoutLMv2ForTokenClassification.from_pretrained('microsoft/layoutxlm-base', num_labels=10)

[/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
   1487         else:
   1488             with no_init_weights(_enable=_fast_init):
-> 1489                 model = cls(config, *model_args, **model_kwargs)
   1490 
   1491         if from_pt:

TypeError: __init__() got an unexpected keyword argument '_configuration_file'

@sgugger
Copy link
Collaborator

sgugger commented Feb 11, 2022

Looking into it, thanks for the repro!

@sgugger
Copy link
Collaborator

sgugger commented Feb 11, 2022

The problem should be fixed on master. We'll make a patch release on Monday with the fix.

@yagmur-q
Copy link

yagmur-q commented Feb 25, 2022

@R0bk Thank you for the great work. There were a lot of missing points I had for RE inference, now mostly clarified.

But I still having difficulty to understand the 'entities' and 'relations' (such as 'start_index' and 'end_index'). Could you give an example of what they represent in a given sentence? I couldn't find a clear answer in the original paper and in other reference papers authors mention. You added this docstring, but it would be great if you exemplify those

Here is the only info from the paper that mention about RE process:

Relation Extraction: Equipped with the document D and the semantic entity label set C, relation extraction aims to predict the relation between any two predicted semantic entities. Defining R = {r0, r1, .., rm} as the semantic relation labels, we intend to find a function FRE : (D, C, R, E) → L, where L is the predicted semantic relation set: L = {(head0, tail0, r0), ...,(headk, tailk, rk)} where headi and taili are two semantic entities. In this work, we mainly focus on the key-value relation extraction.

and

Relation Extraction: Following Bekoulis et al. (2018) , we first incrementally construct the set of relation candidates by producing all possible pairs of given semantic entities. For every pair, the representation of the head/tail entity is the concatenation of the first token vector in each entity and the entity type embedding obtained with a specific type embedding layer. After respectively projected by two FFN layers, the representations of head and tail are concatenated and then fed into a bi-affine classifier.

Thank you

@lawsonxwl
Copy link

Hey Niels,

I've added to the bottom of this notebook here an inference example (please ignore the accuracy, I didn't spend much time finetuning).

For running the inference we just require an empty relations dict as we calculate what all possible relations could be based on the entity labels (the current model only links between entities with labels 1 (the key) and 2 (the value)).

We do however require all the entities to be labelled with the start token index, end token index and a label so we would probably suggest to users in the docs to run LayoutLMv2ForTokenClassification first and then run this based on the results of that.

I'm not really experienced enough with the library to review the previous effort but I think there may be a few things missing there. In terms of going forward would you prefer if I made a new PR from my branch or tried to modify that PR to conform?

So you mean that we need to train 2 models , one is for token classification,one uses the results of the previous model to do the relation Extraction?

@sujit420
Copy link

sujit420 commented Apr 1, 2022

I'm experimenting with LayoutLMv2 and LayoutLMForRelationExtraction.

I referred to https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb for entity detection/predictions using LayoutLMv2

Can someone help me how can I convert these predictions from LayoutLMv2 to entity dict ( which is input to LayoutLMv2ForRelationExtraction) { 'start': torch.IntTensor of shape (num_entites), Each value in the list represents the id of the token (element of range(0, len(tokens)) where the entity starts 'end': torch.IntTensor of shape (num_entites), Each value in the list represents the id of the token (element of range(0, len(tokens)) where the entity ends 'label': torch.IntTensor of shape (num_entites) Each value in the list represents the label (as an int) of the entity }

Did you get the answer for your question?

@NielsRogge
Copy link
Contributor

So you mean that we need to train 2 models , one is for token classification,one uses the results of the previous model to do the relation Extraction?

I'm pretty sure the answer to this question is yes ;)

@Isha09Garg
Copy link

I'm experimenting with LayoutLMv2 and LayoutLMForRelationExtraction.
I referred to https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb for entity detection/predictions using LayoutLMv2
Can someone help me how can I convert these predictions from LayoutLMv2 to entity dict ( which is input to LayoutLMv2ForRelationExtraction) { 'start': torch.IntTensor of shape (num_entites), Each value in the list represents the id of the token (element of range(0, len(tokens)) where the entity starts 'end': torch.IntTensor of shape (num_entites), Each value in the list represents the id of the token (element of range(0, len(tokens)) where the entity ends 'label': torch.IntTensor of shape (num_entites) Each value in the list represents the label (as an int) of the entity }

Did you get the answer for your question?

Not yet @NielsRogge, Can you please help here

@anamtaamin
Copy link

hi, I saw your amazing work https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb#scrollTo=AttFR_dMNVEL
I was just thinking can we take these ques ans as key-value pair in a dictionary?
Eagerly waiting for your response.
Thanks.

@quasimik
Copy link

I've implemented @NielsRogge's comments in #15173 in my own fork. I'm happy to open a PR, or to let someone else take it from here.

@mattdeeperinsights
Copy link

@quasimik Great work! Could you provide a step by step of how we use your new class LayoutLmv2RelationExtractionDecoder? I see you have added to this part here

@data2450
Copy link

data2450 commented Jul 3, 2022

my aim is also to predict key-value pairs according to colab notebbok ,we have to train both token classification and entity detection model first,and then use that ouput as input in
model = LayoutLMv2ForRelationExtraction.from_pretrained('microsoft/layoutxlm-base')

am i right

@alejandrojcastaneira
Copy link

Hello guys, any update on this new component?

@alejandrojcastaneira
Copy link

alejandrojcastaneira commented Jul 20, 2022

Hi @R0bk thanks for the amazing work! I was able to train a RE component with custom data using your fork and the collab notebook that you provided and the results looks very promising! Though at the moment I'm just able to train the model with entities of types 1 & 2, if I set other types of entities inside the "label" field of the "entities" key lists, I got an error. I tried to comment the line that you suggested: https://github.com/R0bk/transformers/blob/d9fe818083017d49487a3a45ca99f52123d68628/src/transformers/models/layoutlmv2/modeling_layoutlmv2.py#L1431 but it didn't work. Can you please point me on some direction on this? Kind regards.

@NurielWainstein
Copy link

hi, I'm having trouble understanding how to use this...
can someone guide me?
I have an image of a random invoice, how do I get the key-value pairs?

In other words how do I use this notebook step by step?

@hjerbii
Copy link

hjerbii commented Aug 16, 2022

Hello,
Thanks a lot for sharing your work with us :) .

On my side, I do not see how we get the ids of tokens where entities start/end for the inference part?
When running LayoutLMForTokenClassification before RE, we only get the label of every token in the input text image.

Could you please share more details on this part?

Thanks!

@Sharanya-krishnamurthi
Copy link

Sharanya-krishnamurthi commented Oct 3, 2022

Hi @R0bk thanks for this work, this helped me train on my data for different usecases and get better results until recently where I happened to update the transformers module in my environment by mistake and then getting again back to your version is giving me RuntimeError: CUDA out of memory. even if my batch_size is 1. For the same data, I was able to train it for RE before. Not sure how to fix the problem tried creating a fresh environment still the problem persists.

Environment details:
python - 3.7.5
pytorch - 1.8.1+cu111
transformers - 4.17.0.dev0
detectron2 - 0.6

Kindly suggest what could be the problem or possibly if I've missed something in the new environment

Thanks!

@jyotiyadav94
Copy link

Hi @R0bk ,@NielsRogge

Thanks for the amazing work
Do you guys have any plans to add the RelationExtraction to the layoutLMV3?
Since there is a huge difference between the results of layoutLM2 & LayoutLMv3.

@Rajmehta123
Copy link

Any updates on model head addition for inference? The output for LayoutLMV2 is not in line with the input for RE. Can these 2 heads be combined for RE task?

@munish0838
Copy link

I'm experimenting with LayoutLMv2 and LayoutLMForRelationExtraction.
I referred to https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_%2B_Gradio_demo.ipynb for entity detection/predictions using LayoutLMv2
Can someone help me how can I convert these predictions from LayoutLMv2 to entity dict ( which is input to LayoutLMv2ForRelationExtraction) { 'start': torch.IntTensor of shape (num_entites), Each value in the list represents the id of the token (element of range(0, len(tokens)) where the entity starts 'end': torch.IntTensor of shape (num_entites), Each value in the list represents the id of the token (element of range(0, len(tokens)) where the entity ends 'label': torch.IntTensor of shape (num_entites) Each value in the list represents the label (as an int) of the entity }

Did you get the answer for your question?

Not yet @NielsRogge, Can you please help here

Hi @Isha09Garg, were you able to use LAyoutLMv2 for RE task? (on FUNSD or other datasets)

@yang0369
Copy link

Hi, has anyone tried to implement RE head on LayoutLM V1?

@Muhammad-Hamza-Jadoon
Copy link

Any updates on model head addition for inference? The output for LayoutLMV2 is not in line with the input for RE. Can these 2 heads be combined for RE task?

is the relation extraction module only created for layoutxlm? or can i also use it for layoutlm v2 and v3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests